I’ve spent the last couple of days tidying up my code base, it has become functional rather than readable during the heady final days of preparing it for last weekends release. I’ve gone back to basics and am attacking it with a series of tests and refactoring approaches to get it into a more happy shape.
Firstly, I attacked performance again. I probably spend too much time on performance because it bothers me when I’m certain that something I’ve just crafted could run a lot faster with an approach that its lurking on the periphery of my thinking regions. This time I was convinced that I could reduce the work of the visibility system because it had a couple of meaty looking for loops sitting in it. I had an early disappointment when I found out that I had got to one of them already with my intended refactor, but was able to recover by completely removing the last of my big three loops at the cost of making dynamic point lights a problem to be solved again later.
Of course these planned efforts were somewhat undermined when I inadvertently made a large improvement by slightly contracting the radius of my vision cone’s clip planes. Hours of work for a few half milliseconds a frame and a one line change that saves a couple all by itself.
Somewhat stubbornly I think there is still more to do on performance. I have a to do list item to make my hexagon grid fully visible to help players figure out what’s going on under the hood at the minute and to allow for better tactical decisions later on. The problem is that I’m currently insisting on make each hex of the grid be its own visibility node and renderable instance and there are about four times as many of them as the rest of the scene combined. The reason I want to keep them as separate objects is because it makes it a lot easier to do fancy things with them, particularly highlighting a certain arbitrary selection of them whenever I choose, but perhaps more. If I didn’t want this flexibility I could make the grid a single quad with a tiling hexagon image.
I have a, somewhat primitive, ‘shader instancing’ system for bundling up all the visible instances of a type into a smaller number of bigger buffers. This helps a lot with the draw calls per frame but incurs a cost in building the larger buffers each frame and still maintains the cost of checking the visibility of each hexagon instance. To improve this further I need to somehow stop refreshing the entirity of what is visible and what is in each instanced buffer every frame. I suspect there is a way to do this but I’ve put it to one side for now.
The past few days I’ve been looking into cleaning up my memory usage. I’m generally fairly diligent about putting my ‘news’ and deletes in the right place but everyone misses things every now and again and the mistakes can pile up if you’ve been mostly ignoring not fatal memory leaks for a while. Early on in the development of my engine I took the precaution of performing all allocations through a #define’d version of new that allowed me to easily swap between alternate versions of my allocators without editing everything or affecting any libraries I was using. This approach is pretty handy for tracking down persistent memory leaks as I can swap out my standard allocator for one that keeps track of an ID for every allocation in a giant hash map. Then when the application shuts down I can see a printed report of any objects that weren’t deallocated along with their IDs. It works pretty well so far.
There are some leaks that escape this method because they are allocations that don’t make use of my macros so to make sure I don’t miss them I also make use of Microsoft’s memory leak detection methodology at the same time. I recommend you read through the whole section on memory leak detection here if you’ve ever had problems with leaks. One quirk they don’t mention is that to get the really useful line numbers of when an allocation with new takes place you also need to add in a line that looks a little like this:
#define new new(1, __FILE__, __LINE__)
Otherwise it will only work for malloc.
Some leaks appear to elude even both of these detection methodologies so for my final set of leaks I resorted to the old programmer standby of the trial and error commenting out of code and recompiling. For this method to work with any degree of success your code has to be robust and independent enough to still run when you comment it out piece by piece. I spent a bit of time ensuring this were initialized to zero to make this possible but eventually I narrowed the problem down to the root cause, fixed it and I was back down to leak free code.
If I was doing this better of course I would probably be making much greater use of template based reference counting pointers for all my heap memory usage. They’re not yet part of the STL but I’m pretty certain they are due to be added in the next round of standardisation, there are also many implementations available through boost and vendor implementations of the TR1. Changing everything over to use shared_ptr rather than new and delete would be a fairly large refactoring right now with only minimal short term benefits (as I’m able to track leaks that occur fairly reliably) so I probably won’t do it for this project but I fully intend to start the next project with reference counted pointers as a key technical inclusion.
Next up will be some more general usability refactoring and code tidying. The key goals here are to make the purpose of classes clearer, adding new ones where they are needed and making all classes more data driven and flexible for reuse.
Coding an indie game is a lot like building a multi story shanty house. Before you build the next story you need to make sure the previous one is going to take the weight without collapsing into a pile of corrugated iron and salvaged wood.
Coming later will be a look at some of the less technical aspects of the game’s development so far. If you have any think to add onto any of my warbling above then feel free to do so in the comments.