OpenGL MultiDrawIndirect with per-instance textures
I saw an interesting talk the other day, “High-performance, Low-Overhead Rendering with OpenGL and Vulkan” where Mathias Schott talked about two OpenGL commands called glMultiDrawArraysIndirect and glMultiDrawElementsIndirect. These two commands give OpenGL the ability to draw multiple objects with vastly different geometries using only “one” draw call. Sounds cool right? So let’s take a deeper dive into OpenGLs Multi Draw Indirect commands.
What does it do
OpenGLs Multi Draw Indirect commands have the following structure:
void glMultiDrawArraysIndirect(
GLenum mode,
const void *indirect,
GLsizei drawcount,
GLsizei stride);
void glMultiDrawElementsIndirect(
GLenum mode,
GLenum type,
const void *indirect,
GLsizei drawcount,
GLsizei stride);
mode
dictates what kind of primitive to draw, (GL_POINTS, GL_LINE, GL_TRIANGLES etc.)type
(ONLY MultiDrawElementsIndirect) specifies the type of data bound to GL_ELEMENT_ARRAY_BUFFER (GL_UNSIGNED_INT, GL_SHORT etc)indirect
is a pointer to a array of structs containing draw commands for each object, see below.drawcount
is the number of objects to drawstride
is the distance between each draw command. 0 means tightly packed
What glMultiDrawElementsIndirect does is equivalent to the following (assuming no errors generated):
GLsizei n;
for (n = 0; n < drawcount; n++)
{
const DrawElementsIndirectCommand *cmd;
if (stride != 0) {
cmd = (const DrawElementsIndirectCommand *)((uintptr)indirect + n * stride);
} else {
cmd = (const DrawElementsIndirectCommand *)indirect + n;
}
glDrawElementsInstancedBaseVertexBaseInstance(
mode,
cmd->count,
type,
cmd->firstIndex * size-of-type,
cmd->instanceCount,
cmd->baseVertex,
cmd->baseInstance);
}
mode
is what primitive to draw. Taken directly from the MultiDraw commandtype
specifies the type of indices. Also taken directly from the MultiDraw command
Basically we are calling one draw call multiple times. According to the documentation the Multi Draw Indirect performs few subrutine calls.
The struct is called a draw command and have the following structure
typedef struct {
unsigned int count;
unsigned int instanceCount;
unsigned int firstIndex;
unsigned int baseInstance;
} DrawArraysIndirectCommand;
typedef struct {
unsigned int count;
unsigned int instanceCount;
unsigned int firstIndex;
unsigned int baseVertex;
unsigned int baseInstance;
} DrawElementsIndirectCommand;
count
refers to the number of used verticesinstanceCount
is the number of instances to draw of the current objectfirstIndex
is the location of the first vertex relative the current objectbaseVertex
(ONLY DrawElementsIndirectCommand) location of current objects first vertex relative bufferbaseInstance
is the current instance for the indirect draw
The descriptions does not really help your understanding on how to use the glMultiDrawIndirect commands, so let’s take a look at one example.
In this example I will be using glMultiDrawElementsIndirect since it models the real world application a little better, but should be trivial to change to Array drawing if that is your thing.
The Example
In this example we will generate 50 commands to draw rectangles and 50 commands for triangles. Each object with its own transformation matrix. We will also give each object single pixel texture to have some colors to show.
The rectangle is composed of 4 triangles (to make it a little more interesting) as shown below together with the layout of the triangle.
The picture below is what we are going to end up with. Nothing too exiting, but it will show how to handle different objects with different amount of vertices and indices each with an individual texture.
The Code
The full code can be found here.
Let us just jump right in looking at the code, starting where the action happens; the render loop.
glUseProgram(gProgram);
glBindVertexArray(gVAO);
generateDrawCommands();
//draw
glMultiDrawElementsIndirect(GL_TRIANGLES, //type
GL_UNSIGNED_INT, //indices represented as unsigned ints
(GLvoid*)0, //start with the first draw command
100, //draw 100 objects
0); //no stride, the draw commands are tightly packed
As we can see we need to use the shader program and bind the vertex array where the data is located. We then generate the draw commands, more on that shortly. One thing to note is that the draw commands can be generated on another thread, this is where the hidden power of MultiDrawIndirect comes from. But for demonstration purposes we are just going to create the draw commands right before we use them, on the same thread.
void generateDrawCommands()
{
//Generate draw commands
SDrawElementsCommand vDrawCommand[100];
GLuint baseVert = 0;
for (unsigned i = 0; i<100; ++i)
{
//quad
if (i % 2 == 0)
{
vDrawCommand[i].vertexCount = 12; //4 triangles = 12 vertices
vDrawCommand[i].instanceCount = 1 //Draw 1 instance
vDrawCommand[i].firstIndex = 0; //Draw from index 0 for this instance
vDrawCommand[i].baseVertex = baseVert; //Starting from baseVert
vDrawCommand[i].baseInstance = i; //gl_InstanceID
baseVert += gQuad.size();
}
//triangle
else
{
vDrawCommand[i].vertexCount = 3; //1 triangle = 3 vertices
vDrawCommand[i].instanceCount = 1; //Draw 1 instance
vDrawCommand[i].firstIndex = 0; //Draw from index 0 for this instance
vDrawCommand[i].baseVertex = baseVert; //Starting from baseVert
vDrawCommand[i].baseInstance = i; //gl_InstanceID
baseVert += gTriangle.size();
}
}
//feed the draw command data to the gpu via the gIndirectBuffer
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, gIndirectBuffer);
glBufferData(GL_DRAW_INDIRECT_BUFFER, sizeof(vDrawCommand), vDrawCommand, GL_DYNAMIC_DRAW);
//feed the instance id to the shader.
glBindBuffer(GL_ARRAY_BUFFER, gIndirectBuffer);
glEnableVertexAttribArray(2);
glVertexAttribIPointer(
2,
1,
GL_UNSIGNED_INT,
sizeof(SDrawElementsCommand),
(void*)(offsetof(DrawElementsCommand, baseInstance)));
glVertexAttribDivisor(2, 1); //only once per instance
}
In this particular example we are generating individual draw calls for each object, matching the layout of the objects uploaded to the GPU. The draw commands is put into a GL_DRAW_INDIRECT_BUFFER
. We are also adding the instance id to vertex attribute 2, in order to find the correct texture for each object inside the fragment shader.
Notice glVertexAttribDivisor(2,1)
which tells the gpu to use the same baseInstance number until the next object, in essence recreating the gl_InstanceID
. This is needed since gl_InstanceID
starts at 0 for each new object. Remember that glMultiDrawElementsIndirect
calls glDrawElementsInstancedBaseVertexBaseInstance
once for each draw command. Since we are only drawing 1 instance of each object gl_InstanceID
will always be 0
in this application (and in the general case, making it as good as useless).
To work around this limitation we manually upload the instanceId to the shaders.
void GenerateGeometry()
{
//---
// Generating and binding vertex buffer data
// In this example also created matrix data (vMatrix) here
//--
//Setup per instance matrices using Vertex attributes and the vertex attrib divisor
glGenBuffers(1, &gMatrixBuffer);
glBindBuffer(GL_ARRAY_BUFFER, gMatrixBuffer);
glBufferData(GL_ARRAY_BUFFER, sizeof(vMatrix), vMatrix, GL_STATIC_DRAW);
//A matrix is 4 vec4s
glEnableVertexAttribArray(3 + 0);
glEnableVertexAttribArray(3 + 1);
glEnableVertexAttribArray(3 + 2);
glEnableVertexAttribArray(3 + 3);
glVertexAttribPointer(3 + 0, 4, GL_FLOAT, GL_FALSE, sizeof(Matrix), (GLvoid*)(offsetof(Matrix, a0)));
glVertexAttribPointer(3 + 1, 4, GL_FLOAT, GL_FALSE, sizeof(Matrix), (GLvoid*)(offsetof(Matrix, b0)));
glVertexAttribPointer(3 + 2, 4, GL_FLOAT, GL_FALSE, sizeof(Matrix), (GLvoid*)(offsetof(Matrix, c0)));
glVertexAttribPointer(3 + 3, 4, GL_FLOAT, GL_FALSE, sizeof(Matrix), (GLvoid*)(offsetof(Matrix, d0)));
//Only apply one per instance
glVertexAttribDivisor(3 + 0, 1);
glVertexAttribDivisor(3 + 1, 1);
glVertexAttribDivisor(3 + 2, 1);
glVertexAttribDivisor(3 + 3, 1);
}
In this example we are using attribute location 3 as a start location for the transform matrix. Since it is a matrix we need to activate 4 consecutive vertex attrib arrays and upload the data as 4 vec4. The reason for this is that glVertexAttribPointer
can only handle a maximum of 4 components per vertex attribute (the second parameter). If you are wondering, Matrix is just a convenience struct that looks like this:
struct Matrix
{
float a0, a1, a2, a3;
float b0, b1, b2, b3;
float c0, c1, c2, c3;
float d0, d1, d2, d3;
};
Generating the textures are quite simple in this example. We start by creating a 3D texture storage for 100 different textures, then populate it with our generated data. What we have done here is creating a basic texture array. See below in the fragment shader to see how it is used.
void GenerateArrayTexture()
{
//Generate an array texture
glGenTextures(1, &gArrayTexture);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D_ARRAY, gArrayTexture);
//Create storage for the texture. (100 layers of 1x1 texels)
glTexStorage3D(GL_TEXTURE_2D_ARRAY,
1, //No mipmaps as textures are 1x1
GL_RGB8, //Internal format
1, 1, //width,height
100 //Number of layers
);
for (unsigned int i(0); i != 100; ++i)
{
//Choose a random color for the i-essim image
GLubyte color[3] = { GLubyte(rand() % 255),GLubyte(rand() % 255),GLubyte(rand() % 255) };
//Specify i-essim image
glTexSubImage3D(GL_TEXTURE_2D_ARRAY,
0, //Mipmap number
0, 0, i, //xoffset, yoffset, zoffset
1, 1, 1, //width, height, depth
GL_RGB, //format
GL_UNSIGNED_BYTE, //type
color); //pointer to data
}
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
}
Lastly we will take a look at our shaders, starting with the vertex shader
//Vertex Shader
#version 430 core
layout (location = 0 ) in vec2 position;
layout (location = 1 ) in vec2 texCoord;
layout (location = 2 ) in uint drawid;
layout (location = 3 ) in mat4 instanceMatrix;
//locations 4,5,6 is also take by instanceMatrix
layout (location = 0 ) out vec2 uv;
layout (location = 1 ) flat out uint drawID;
void main(void)
{
uv = texCoord;
drawID = drawid;
gl_Position = instanceMatrix * vec4(position,0.0,1.0);
}
Only one small interesting thing is happening here and that is that we disable interpolation for drawID using the flat
keyword.
//Fragment Shader
#version 430 core
layout (location = 0 ) in vec2 uv;
layout (location = 1 ) flat in uint drawID;
layout (location = 0) out vec4 color;
layout (binding = 0) uniform sampler2DArray textureArray;
void main(void)
{
color = texture(textureArray, vec3(uv.x, uv.y, drawID) );
}
The fragment shader is just as uninteresting as the vertex shader. Note that we use the drawID to look into the 2D texture array to find the texture we are interested in.
The End
So this is a small example on how to use glMultiDrawElementsIndirect
(and indirectly how glMultiDrawArraysIndirect
).
A small challenge for the reader would be to update the transform matrices each frame (Hint: glMapBuffer
or similar). Doing this would make the example able to almost work as a sprite renderer.
Another challenge would to upload individual sprite textures with the same size. Sadly I have no idea at the moment on how to work with different sized sprites.
For those interested I recommend playing around with the values in the draw command and try to understand what each value really does.
I hope that this foray into OpenGLs MultiDrawElementsIndirect have been helpful for you!
0 Comments