Is that expensive to call glBindVertexArray unnecessary ? Well, to answer that I created a loop that renders a cube with a VAO 200000 times and test it in different ways. In each test a single instance of a ModelClass (ModelCube) is used. ModelClass owns a VAO an two virtual functions draw() and bindAndDraw(). The cubes are render to random locations extracted from a precalculated array of Transforms.
-
Set only the transform to get a base time
renderTest() { for (i) //exec 200,000 times { SetTransformationMatrixUniform(Transforms[i]); } }
-
Virtual call to a ModelClass::bindAndDraw()
renderTest() { for (i) //exec 200,000 times { SetTransformationMatrixUniform(Transforms[i]); ModelCube->bindAndraw(); } }
-
Bind buffer before loop and call to a ModelClass::draw()
renderTest() { glBindVertexArray(ModelCube->VAO); for (i) //exec 200,000 times { SetTransformationMatrixUniform(Transforms[i]); ModelCube->draw(); } }
-
Direct call to bind and draw
renderTest() { for (i) //exec 200,000 times { SetTransformationMatrixUniform(Transforms[i]); glBindVertexArray(ModelCube->VAO); glDrawArray(ModelCube->VAO); } }
-
Bind buffer before loop and direct call to draw
renderTest() { glBindVertexArray(ModelCube->VAO); for (i) //exec 200,000 times { SetTransformationMatrixUniform(Transforms[i]); glDrawArray(ModelCube->VAO); } }
-
Direct call to bind (conditional) and draw
The bind condition check is based on a thread_local variable.
renderTest() { for (i) //exec 200,000 times { SetTransformationMatrixUniform(Transforms[i]); if (isNotBinded(ModelCube->VAO)) glBindVertexArray(ModelCube->VAO); glDrawArray(ModelCube->VAO); } }
Times are approximate but good enough to comparison purposes
Test | Time | Delta base |
---|---|---|
Base Time (1) | 24ms | |
Virtual bind+draw (2) | 43.5ms | 19.5ms |
Virtual draw (3) | 32ms | 8ms |
Direct bind+draw (4) | 43.5ms | 19.5ms |
Direct draw (5) | 32ms | 8ms |
Direct bind(cond)+draw (6) | 44ms | 20ms |
As expected with high speed tasks glBindVertexArray doesn't check if the buffer is already binded and just process the request as it is not. That is really expensive, as much as calling glDraw. One thing to notice is that the cost of a virtual call in this scenario is negligible, even when the bind condition check was noticed.
How expensive would it be to assign the buffers in a loop with one shared vertex array?
E.g
Would a few calls to glVertexArrayVertexBuffer and one to glVertexArrayElementBuffer be faster or slower the using a vao for every mesh? I know it's more opengl calls in general, but I'm unsure how they work under the bonnet.