This post is going to get a bit into the weeds, so strap in. Over the last few weeks, when working on my still unnamed game project, I noticed that framerates were getting rather sluggish as I continued to add more and more dynamic entities, especially when I turned on all the "bells and whistles" and my debug overlay, which includes showing bounding boxes for everything in the game. This was especially apparent on my Linux machine with integrated graphics, where framerates could get as low as 25-30fps while displaying the debug overlay. I decided to take some time to investigate the cause and see what could be done about it.
Investigation
The rendering of the game is done in a few stages, composed of layers of static tiles and dynamic entities. Tiles are always in the same positions and generally don't change much, so rendering them is quite efficient, whereas entities can have dynamic positions and change all the time. I focused my attention first on the entities, since I suspected their additional complexity meant their rendering logic was more likely to be coded inefficiently.
I looked over my shaders, which had been written months ago at this point, and realized that in many of the fragment shaders, I was passing in information via data textures and looping over it to effectively support rendering overlapping entities.
Fragment Shader Pseudocode
// First data texture contains tile position and sprite coordinates.
sampler entityValues1
// Second data texture contains entity pixel offset and size.
sampler entityValues2
void main()
{
foreach (entity)
{
get_data_from_textures()
calculate_entity_position()
get_entity_sprite()
get_entity_size()
calculate_pos_in_sprite()
color = get_color_at_pos()
if (color_is_visible)
{
break
}
}
return color
}
It should be clear that this is incredibly inefficient. Fragment shaders are run by the GPU for every single pixel on the screen, every frame, so this is clearly contributing to the slowdown. This style of shader was likely a side-effect of the way I had developed the entire rendering framework in the engine. Everything was being rendered onto the same quad that covered the window, drawn multiple times with different shaders supplied with different data. Think of it like layers of transparencies on an overhead projector--simple, but it works (albiet inefficiently). In order to be able to use simpler shaders that don't require loops, I would have to split up the entities up front, having the shader take care of a single entity instead of the whole screen. And in order to do that, I would need to have each entity have its own dynamically-positioned quad.
The Process
I begun the long task of rewriting the rendering engine from the ground up, which required touching almost every piece of code that dealt with rendering. It also meant writing support for a few OpenGL features I hadn't had to worry about before, like a Vertex Array wrapper class. I rewrote the camera manager so I could more easily get the proper perspective view matrix for the new quads, simplified my texture handler class, since I didn't have to worry about data textures any more, and wrote a low-level quad handler. During the process, I even stumbled upon something surprising. From my C++ work, I've always learned to try to minimize branching where possible for time-critical code. When writing the quad handler (which converts entity quads to floating point values that OpenGL can use), my first attempt had a few if statements and a switch statement. I thought I could do better, so I worked to format the code in a way so there was absolutely no branching. I managed this by using a delegate map and calling these functions based on the "switched" value. Despite my efforts, I found that it was slower than before I had removed the branching. Stumped, I did a little digging and found that C#, unlike C++, is very unlikely to inline function calls and the overhead of actually calling these functions turned out to be more costly than some branching! Who knew?
Anyways, I eventually finished the process off by writing a much more elegant entity shader that only deals with a single entity at a time, and has all the data it needs from vertex attributes. Here's the actual GLSL code for the new shader:
#version 130
uniform sampler2DArray entitySheet;
in vec4 texCoordV;
out vec4 fragcolor;
void main()
{
fragcolor = texelFetch(entitySheet, ivec3(texCoordV.xyz), 0);
}
Really doesn't get any simpler than that!
So, what now?
Rewriting the entity shader was just the first step, but it was probably the longest and most impactful change. I used the same techniques to dynamically create quads for everything except tiles, such as UI elements and text as well. It was a long learning process but the results were immediately apparent. I now get a consistent 300fps (when v-sync is disabled of course), even on the slow integrated graphics. Now that I don't have any concerns with the performance of rendering, I am happy to be able to move back onto working on game content and assets instead.