4/07/2013

How to display a world of cubes, part 3.

In a world where everything is made out of cubes (that computer guys like to call voxels!), it is important to be able to display a lot of them efficiently. This post describes how the game Castle Defender does it and why it needs to be optimised!

Part 3/ Faster Graphics

Now that we know how the cubes are stored and that we understand how they are displayed, let's see how the display can be optimised.

To compare the performances of the different techniques we are going to discuss, we will look at the number of frames computed per second (FPS for short). The FPS achieved by each method will be measured on a demanding scene composed of 200 castles. The model of castle used is the one introduced in Part2, so the total number of faces in the scene is 1525200.
A complex scene.
Using the "culling algorithm" from Part2, the program achieves only 3 FPS. I don't even give a figure for the "stupid algorithm"; when I use it the program just dies!

In case you don't realise it, I should point out that 3 FPS is not enough to play a game (it's 3 images per second!), for example, movies are commonly recorded and displayed at 24 FPS. When we play a game, we like to have at least 30 FPS, and 60 FPS is better, as it allows smoother movements.

So what is wrong with our program?
Can its speed be improved, or do we have to stick to small scenes?

1. Plain Old OpenGL - 3 FPS


The main problem is that the code is written in "plain old OpenGL". What does it mean? Well, let's have a look at the implementation for the "display top face" from Part2.

Code snippet 1: old OpenGL
1. glColor3ub(red, green, blue);
2. glBegin(GL_QUADS);
3. glVertex3f(i  ,j  ,k+1);
4. glVertex3f(i  ,j+1,k+1);
5. glVertex3f(i+1,j+1,k+1);
6. glVertex3f(i+1,j  ,k+1);

7. glEnd();

In the above code, (i,j,k) is the position of the cube and (red,green,blue) is color components of the cube.

Line 1 informs OpenGL of the color of the following vertex.
Line 2 informs OpenGL that we are drawing a square.
Lines 3 to 6 define the coordinates of each vertex of the square.
Line 7 informs OpenGL that we have finished drawing.
The above code requires 7 OpenGL calls. At each frame, the coordinates of the 4 vertices of the face are computed and sent to the graphics card, where they are processed.

And this code is executed for each of the 1525200 faces!

2. Vertex Array - 13 FPS


No wonder the previous method is so slow, all the faces are recomputed at each frame. It would be better if we could store the faces in an array at the beginning of the program, and then merely ask the graphic card to render the array.

That's what the vertex arrays allow us to do. Using them, this is how the previous code looks:

Code Snippet 2: Vertex Array

//beginning of the game
colors.add(red,green,blue);...(x4) 
vertices.add(i  ,j  ,k+1);
vertices.add(i  ,j+1,k+1);
vertices.add(i+1,j+1,k+1);
vertices.add(i+1,j  ,k+1);

//inside the rendering loop
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
glVertexPointer(3,GL_FLOAT, 0, &vertices[0]);
glColorPointer( 3,GL_FLOAT, 0, &colors[0]);
glDrawArrays(GL_QUADS, 0, vertices.size());
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);


The above code stores the coordinates (and the colors) of all the vertices into some kind of arrays (stored on the CPU side) at the beginning of the program. Then, in the rendering loop, OpenGL is asked to render the arrays with a call to glDrawArrays.
Using vertex arrays, the program achieves 13 FPS. This is better, but still not enough to play the game smoothly.

3. Vertex Buffer Object - 110 FPS


Using vertex arrays is a lot more efficient than making a lot of calls to OpenGL, but as the vertex arrays are stored on the CPU memory they still have to be transferred at each frame to the graphics card (also called GPU) memory.

By using something called vertex buffer objects, we can upload the CPU arrays to the GPU memory at the beginning of the program. This way, the vertices will be stored for good on the GPU, at it will be able to render them right away, without having to wait for the memory to be copied.

Code Snippet 3: Vertex Buffer Object
//beginning, store vertices on the CPU
... similar to code snippet 2, but with vertex indices as well. Vertex indices tell the GPU how the vertices are linked.
//beginning, upload vertices to a GPU buffer
glGenBuffers(1, &vertex_buffer);
glBindBuffer(GL_ARRAY_BUFFER, vertex_buffer);
glBufferData(GL_ARRAY_BUFFER, size_of_array, &vertices[0],GL_STATIC_DRAW);
//upload indices
glGenBuffers(1, &index_buffer);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, index_buffer);
glBufferData(GL_ELEMENT_ARRAY_BUFFER,size_of_array, &indices[0],GL_STATIC_DRAW);
//upload colors
...

//inside the rendering loop
glEnableClientState(GL_VERTEX_ARRAY);
glEnableVertexAttribArray(0);

glBindBuffer(GL_ARRAY_BUFFER, vertices_.buffer);
glVertexAttribPointer((GLint)0, 3, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, index_buffer);
glDrawElements(GL_QUADS, indices.size(), GL_UNSIGNED_INT, 0);
glDisableVertexAttribArray(0);
glDisableClientState(GL_VERTEX_ARRAY);


As you can see, the program is becoming more and more complicated, but it is also becoming a lot more efficient. Now the same scene with its 200 castles renders at 110 FPS, more than enough frame (my screen can only do 60 FPS!).

In the end, Vertex Buffer Objects are really efficient compared to the other techniques and hopefully, all the graphics of the game will use them at some point. But for the moment, I haven't been able to make them work on Mac OSX (although they work fine on linux) so the game is still stuck to vertex arrays.

If someone knows how to use vertex buffer object on Mac, please leave a comment, any help will be greatly appreciated.

Another advantage of the vertex buffer object, is that they come with the use of shader programs that allow you to change the look of the faces with more control.

Some examples of what shader programs can achieve will be given in a future post.

Nick

No comments:

Post a Comment