Jekyll2023-06-01T06:42:37+02:00https://blog.voxagon.se/feed.xmlVoxagon BlogA game technology blog by Dennis GustafssonStreaming containers2023-06-01T00:00:00+02:002023-06-01T00:00:00+02:00https://blog.voxagon.se/2023/06/01/streaming-containers<p>This blog post is about an issue I’ve run into several times, but still unsure how to solve in a nice way. Consider this more of an “organize-my-thoughts” type post, rather than a solution to the problem.</p>
<p>In many cases I end up in a situation where I want to pass a bunch of data from one subsystem to another. The data is organized into different types using structs or classes. Unless there is more than one type of data, they may have different sizes, and the most common use case, at least for me, is that such objects share a common base. For example different types of constraints generated by collision detection and joints fed into the rigid body solver or draw calls fed into the graphics system, but let’s take a traditional event system as a simplified example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>struct Event
{
unsigned char type;
int flags;
}
struct SoundEvent : public Event
{
Sound* sound;
Vec3 worldPos;
float volume;
};
struct CollideEvent : public Event
{
Object* a;
Object* b;
};
</code></pre></div></div>
<p>These events can be produced from various locations in the code and some data structure is needed to store them. At some point we need to traverse and dispatch them to listeners. In this case, one could imagine some direct callback approach, but it can get messy in a threaded environment, so let’s say we decide it’s a good idea to queue them up and dispatch them later.</p>
<p>There are multiple ways we could store these events in memory. We could, for instance, allocate each new event on the heap, store their pointers in an array, go through that when we dispatch and then delete them, but assuming this happens every frame in a game loop, that could potentially be a lot of allocations every frame, and they would all be scattered in memory, leading to a lot of cache misses.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void addEvent(Event* event)
{
events.push_back(event);
}
void dispatch()
{
for(int i=0; i<event.size(); i++)
{
dispatch(events[i]);
T_DELETE(events[i]);
}
}
</code></pre></div></div>
<p>Another solution is to keep a separate array for each object type, so each subclass of Event would have it’s own container in EventSystem. This will keep all events of the same type nicely packed in memory, but it leads to more code. Probably not a big problem if the number of different types is small, but once it get bigger, the code will be harder to maintain:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void addEvent(Event& event)
{
if (event.mType == SOUND)
soundEvents.push_back((SoundEvent&)event);
if (event.mType == COLLIDE)
collideEvents.push_back((CollideEvent&)event);
}
void dispatch()
{
for(int i=0; i<mSoundEvent.size(); i++)
dispatch(soundEvent[i]);
for(int i=0; i<mCollideEvent.size(); i++)
dispatch(collideEvent[i]);
}
</code></pre></div></div>
<p>One could of course also imagine multiple addEvent methods that take different types, but the concept and the amount of code duplication is similar. This solution also has the drawback of not maintaining the order in which the events were issued, which may or may not be a problem.</p>
<p>If the order is a important, we could keep a separate array of pointers into the other arrays that can be traversed when dispatching, adding an extra level of indirection, but it’s a bit awkard and sort of counteracts the whole idea of packing the objects tightly in memory.</p>
<p>A third option would be to group the different types into a single uber-type, for instance using a union of structs:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>struct Event
{
unsigned char type;
int flags;
union
{
struct
{
Sound* sound;
Vec3 worldPos;
float volume;
} sound;
struct
{
Object* a;
Object* b;
} collide;
} data;
};
</code></pre></div></div>
<p>This would allow us to easily put them all in the same vector and traverse that linearly when dispatching. But this obviously has the downside of all events now being as large as the largest subtype, wasting a lot of memory.</p>
<p>What we ideally want is actually really simple. We want to store differently sized objects in a tightly packed lump of memory that can later be traversed linearly. Think of it as a stream of data, very similar to how state is usually serialized to disk.</p>
<p>The problem is that C/C++ is ill-equipped for this access pattern, since built-in arrays and containers usually operate on one specific type. I believe the historical reason for this is because older CPUs couldn’t access certain types unless they were properly aligned in memory. Unaligned reads on the Alpha or SPARC CPU (and ARM CPUs up until ARMv6) may cause undefined behavior or simply crash on unaligned pointer access. Struct/class members in C and C++ are automatically aligned (padded with unused data) to circumvent this issue, and the size of a struct itself is also adjusted so that the largest member will always be aligned when stacked on top of each other. This assumes that only structs of the same type are stacked, so if we manually stack structs of different types on top of each other in memory, alignment goes out the door.</p>
<p>On modern 64-bit architectures, alignment is less of an issue, if any at all. The CPU will gladly read and write unaligned pointer access (it may sound simple, but there’s a good amount of complex circuitry to make it happen), but there might be a slight performance penalty in some specific cases. I have not done enough research on this, but it seems that unaligned reads and writes will only get a performance penalty if the access happens to straddle two cache lines. On the contrary, performance for a lot of software running on modern CPUs is restricted by accessing memory. Reducing memory footprint will generally increase performance, so it’s a balancing act.</p>
<p>So, if we would put our events of different types in a contiguous lump of memory <em>and</em> need to respect alignment requirements, it could be implemented with something like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>template <class T> void addEvent(const T& event)
{
memcpy(mBuffer+mOffset, &event, sizeof(T));
mOffset += sizeof(T);
}
addEvent(mySoundEvent);
</code></pre></div></div>
<p>This solves the problem of lining up objects of different sizes in memory, but how do we get them back when it’s time to dispatch? Since the type is stored in the first member of the base type, we can peek at the first byte to determine the type and use a similar template method for retrieving the data.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unsigned char getNextType()
{
return mBuffer[mOffset];
}
template <class T> void getNextEvent(T& event)
{
memcpy(&event, mBuffer+mOffset, sizeof(T));
mOffset += sizeof(T);
}
unsigned char t = getNextType();
switch(t)
{
case SOUND:
{
SoundEvent sndEvt;
getNextEvent(sndEvent);
dispatch(sndEvent);
break;
}
...
}
</code></pre></div></div>
<p>First copying the event into the memory stream and then copying it again to get it back somewhat counteracts the whole idea of the being really efficient, but if we disregard alignment and rework the interface we can read and write directly into the stream instead with something like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>template <class T> T& addEvent()
{
mOffset += sizeof(T);
return *reinterpret_cast<T*>(mBuffer+mOffset-sizeof(T));
}
unsigned char getNextType()
{
return mBuffer[mOffset];
}
template <class T> const T& getNextEvent()
{
mOffset += sizeof(T);
return *reinterpret_cast<const T*>(mBuffer+mOffset-sizeof(T));
}
//Produce events
SoundEvent& sndEvent = addEvent<SoundEvent>();
sndEvent.volume = 0.5f;
...
//Dispatch events
unsigned char t = getNextType();
switch(t)
{
case SOUND:
dispatch(getNextEvent<SoundEvent>());
break;
...
}
</code></pre></div></div>
<p>This would allow us to place all objects of different types linearly in a tightly packed chunk of memory and then retrieve them with no copying, no wasted memory or other overhead. Would this be faster than other methods, or would the unaligned access counteract the smaller memory footprint? Only one way to find out…</p>Dennis GustafssonThis blog post is about an issue I’ve run into several times, but still unsure how to solve in a nice way. Consider this more of an “organize-my-thoughts” type post, rather than a solution to the problem.My journey into game development2021-02-22T00:00:00+01:002021-02-22T00:00:00+01:00https://blog.voxagon.se/2021/02/22/background<p>I often get the question how I got into game development and if I have any tips for beginners. Here’s my story and thoughts about getting into game development.</p>
<h1 id="childhood">Childhood</h1>
<p>I’ve never been particularly interested in playing games myself. I never had a gaming console as a kid, but ever since I was very young I’ve had a strong interest in engineering and technology. My early interest in computers was entirely centered around programming, and not playing games.</p>
<p>I somehow convinced my parents to get me a Commodore VIC 64, because that was what one of my friends had. I’m not sure how old I was, but I must have been eleven or twelve. Back then, the printed manual for a computer was an introduction to programming (BASIC, in this case). When turning the computer on, there was a prompt where you could start programming. Overall, the bar to enter programming was way lower than now. No choice of programming language, no game engine, no downloading and installing stuff, you just turned the computer on and could instantly start programming (like, literally instantly, the interpreter was burnt into a ROM chip).</p>
<p><a href="/assets/2021-02-22-vic64.jpg"><img src="/assets/2021-02-22-vic64.jpg" alt="" /></a></p>
<p>Programming languages sucked, performance was terrible and debuggers non-existent. If you made an error, the computer froze and you had to turn it off and back on again and start over. It was frustrating, tedious and very unintuitive, but at the same time an excellent introduction to how computers work. In order to put a sprite on the screen, you had no choice but to map out each pixel on paper, learn binary numbers, convert that to decimal and load it into a specific memory address. Since there were no tools, everything was cumbersome, but at the same time, everything also seemed within reach without having to learn that much. There was only one way to do things – the hard way.</p>
<p>A few years later I upgraded to a Commodore Amiga 1000 and a whole new world opened up. This was much more similar to computers as we know them today, with a proper desktop, multi-tasking, a file system, etc. It shipped with a programming language (AmigaBASIC), but for some reason I never really got into it. Instead, I got introduced to the AMOS programming language, which I remember as an absolutely fantastic environment for learning to make games. It had a lot of built-in functionality for doing the most basic things, like loading images, playing sounds, drawing lines, etc. It also had the ability to execute inline assembly code which made it very powerful.</p>
<p>Getting better at programming and learning the hardware I got more and more comfortable programming directly in assembly language instead of AMOS and finally swithed over to using AsmOne as my default programming environment. In retrospect this was a terrible move, because writing everything in assembly language is overly complicated compared to using something like C and just use with assembly were needed. I think this poor decision was mostly because I simply didn’t know that C existed, nor how to combine it with assembly. Remember that this was before the Internet was a thing, so the only knowledge you had access to was through your friends and good dose of curiosity and trial-and-error.</p>
<h1 id="university">University</h1>
<p>There were no game educations available in Sweden at the time, and I’m not sure I would have chosen one even if there was. At this point I had not decided on a career in game development, maybe because game development wasn’t really seen as a career option at all, so I went for a more traditional engineering program – Master of Science in Media Technology at Linköping University. This is where I first got in contact with object oriented programming through Java and later C++. I took classes in linear algebra, data structures, 3D rendering, physical modelling and animation, physics, acoustics, etc. It was definitely a good foundation for a game developer, even though this wasn’t a game centric education.</p>
<p><a href="/assets/2021-02-22-imp.jpg"><img src="/assets/2021-02-22-imp.jpg" alt="" /></a></p>
<p>It was at university I developed a passion for game physics. I can’t remember exactly what caught my attention, but I wrote my first rigid body simulator in 1998, inspired by the papers on impulse based dynamics by Brian Mirtich. At this time physics was rarely seen in video games. The only one I remember studying intensely was Carmageddon 2, which featured incredibly sophisticated rigid body simulation for a game at that time. My first simulator was written in Java, with collision detection in C through the JNI interface. It was later rewritten in C++ and featured a wrecking ball machine at a building site.</p>
<h1 id="game-physics-and-middleware">Game physics and middleware</h1>
<p>For the final exam project at Linköping University I decided to make a game physics SDK with Marcus Lysén. It never really reached a usable state, but was enough to encourage us to form a company around it and develop it further. We teamed up with Jonas Lindqvist and founded Meqon Research. Around the same time, other physics SDKs started popping up. The first verison of Havok got released. Mathengine was already on the market, and there was Ipion (mostly known for being used in half-life 2), PhysX by Novodex, and the open source project ODE. Even though I wouldn’t admit it at the time, we had the weakest product, no experience and no money, but somehow we managed to release the Meqon SDK a few years later and got a couple of customers. Most notably 3D Realms licensed our technology for Duke Nukem Forever, which gave us the confidence and credibility to push forward and grow the team to about a dozen people. All in all a very fun and intense period of my career, but completely unsustainable, stressful and unhealthy.</p>
<p><a href="/assets/2021-02-22-meqon.jpg"><img src="/assets/2021-02-22-meqon.jpg" alt="" /></a></p>
<p>In 2005, Meqon was acquired by AGEIA and the whole team was integrated into the PhysX machinery. I worked as one of three software architects and got the chance to work with some incredibly talented people across the world, many of them I’m still in contact with today. This was a fantastic journey and undoubtedly an important cornerstone of my career. The people I worked with at AGEIA also influenced my coding style in a very important way. Coming from an academic, object oriented programming background, I started to question everything when I got in contact with experienced game developers who routinely rejected most of that in favor of a more direct C-like programming style that I slowly started adopting myself and still use today.</p>
<p>I left AGEIA in 2007, just before they got acquired by NVIDIA to work on scientific visualization. At this point I also started working on my own C++ framework to use for future projects. It wasn’t a game engine, but more of a low level framework with the functionality needed to make a game engine, such as vector math, file IO, compression, geometry, input, audio, rendering, scripting, etc. Creating your own tech was already at the time considered doomed to failure (even more so today), but doing it was a lot of fun and was undoubtedly an important key decision in my career. With a programming framework that I wrote from scratch, thus knowing inside out, I could very quickly implement new ideas and projects on top of that without ever running into any limitations.</p>
<p>One of the first projects I created with the new framework was Dresscode, a game engine profiling tool that I later sold to RAD Game Tools (now reworked into a product called Telemetry). Even if the framework has been rewritten and improved upon in several iterations, I’m still using it today for almost everything I do.</p>
<h1 id="indie-game-development">Indie game development</h1>
<p>Up until this point I never really made an actual, released game, but that changed in 2010, when I teamed up with Henrik Johansson (one of the people I hung out with in the Amiga days) and founded Mediocre. Going from game technology and middleware to making actual games was equal parts fun and frustration. I had no experience with game design, but started appreciate it more than I thought I would. An interesting thing to note here is that both Henrik I had very little interest in playing games. We were not gamers, which I think is quite unusual for game developers, but it is my firm belief that playing games is orthogonal to being successful at making them. There are great developers who play a lot of games and there are great developers who never play games. Playing a lot of games is not a bad thing, but it does not make you good at making them, it makes you good at playing them (and this probably applies to a lot more than game development).</p>
<p>We did our first game, Sprinkle, as a part-time project while still doing contract work on the side to sustain our living and I think this was a really wise decision which allowed us to experiment and iterate on the game design to find something unique, with no real time pressure. It also allowed us to spend that extra time polishing the game prior to release.</p>
<p><a href="/assets/2021-02-22-sprinkle.jpg"><img src="/assets/2021-02-22-sprinkle.jpg" alt="" /></a></p>
<p>I think the primary reason Sprinkle became successful was because we found something uniqe, but as always it’s hard to pinpoint one single factor. We had good timing, both because the App Store and mobile gaming in general was still young, and not very exploited, but also because there was a general interest in indie games at the time. Previous connections from NVIDIA and Meqon also contributed to getting us introduced to Apple and Google prior to release, thus increasing our chances of getting featured.</p>
<p>There’s a lot more to the story, including the other Mediocre games and everything that led up to Teardown, but I think I’ll stop here, since at this point I’m already a full-time indie game developer.</p>
<h1 id="advice">Advice</h1>
<p>For learning programming and game development today I don’t really feel like I’m in a position to give beginner advice, because the conditions today are so different from when I started, but for programming in particular, it is my firm belief that experience is the most important factor. Write a lot of code and you’ll eventually get good at it. A good way to do this is to find a way to enjoy it rather than just trying to learn it. My career took a giant leap when I finally embraced that and focused on what I love the most – doing low level stuff and building things from scratch, but it may very well be something else for you.</p>
<p>As an indie developer you will often hear the advice to focus on marketing, otherwise you’re doomed. I don’t agree with that. Making a indie game is hard. Marketing an indie game is even harder. Marketing a <em>mediocre</em> indie game is nearly impossible. If you’re good at making games, focus your efforts on making a unique, fun and polished game instead. If your game isn’t appealing, make another iteration on the design, revisit the mechanics, the art style or whatever you’re good at until it has something that other games don’t. I think indie developers generally have a better chance of making a game that markets itself rather than trying to market a game that just isn’t very good.</p>
<p>Making something unique is probably the most important aspect. As a small developer you cannot realistically create a clone or even a variation of an existing game and expect it to be better than what’s already out there. With a tiny team you also cannot compete with large amounts of content, nor with technology. Originality is pretty much the only aspect of a game that works in your favor, so embrace it. Keep the scope as small as possible and polish until it shines in the dark.</p>Dennis GustafssonI often get the question how I got into game development and if I have any tips for beginners. Here’s my story and thoughts about getting into game development.The Spraycan2020-12-03T00:00:00+01:002020-12-03T00:00:00+01:00https://blog.voxagon.se/2020/12/03/spraycan<p>Teardown uses an 8-bit color palette for voxel materials, so any voxel volume can have up to 255 different materials and the representation per voxel is then just a single byte to save memory. A material specifies not only the color, but also things such as roughness, emissiveness, reflectivity and physical material type (wood, metal, foliage, etc). Each object can have a unique palette, but a lot of them share the same one. When something breaks, all the pieces inherit the original object palette, so the number of palettes do not increase over time. I pack all palettes in a texture that is 256 in width and number of materials in height and keep that on the GPU for rendering. This way of handling materials conflicts with one particular feature that I really wanted in the game - the spraycan.</p>
<p>If each voxel stored RGB values, recoloring them would be trivial, but doing it with a fixed palette is a whole different story, especially since I wanted the ability to paint with two different colors (yellow for spraycan and black from fire and explosions) and also allow recoloring in several shades to do gradients and antialiased edges. Here is what the end result looks like and how I solved the problem.</p>
<p><a href="/assets/2020-12-03-spraycan.png"><img src="/assets/2020-12-03-spraycan.png" alt="" /></a></p>
<p>The basic idea is to create color variations of all used materials in the palette and populate the unused areas with these variations as a precomputation step at load time. Allowing two color shades in four steps requires eight empty slots in the palette per used material, cutting the usable number of entries in the palette down from 255 to a mere 28.</p>
<p>Most objects in teardown actually only use a handful of materials, so this is rarely a problem. A simple prop, like a chair or a table might even use just a single material, but the more complex ones, like a large boat or a house might use dozens of materials. On top of this, small objects are often merged into larger volumes to improve performance and at this merge step, materials from all merged objects must be combined into the same palette, so it can fill up pretty quickly.</p>
<p>If running out of empty slots, I search for visually similar materials and try to squeeze as many as possible of them into the palette. Afterwards I create a translation table for each shade that can be used as a lookup when recoloring (I know DOOM did a similar thing back in the day to emulate lighting with a fixed palette by using a translation table to pick darker variants of existing colors by referencing the best match out all existing colors in the palette). For each palette there is a translation table like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unsigned char yellowVariant[256][STEPS];
</code></pre></div></div>
<p>So, for instance if I want to tint a voxel one step towards yellow I do this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>yellow = yellowVariant[original][0];
</code></pre></div></div>
<p>And then if I want to make the voxel even more yellow I can do the same one more time, but now with the new index:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>moreYellow = yellowVariant[yellow][0];
</code></pre></div></div>
<p>This would have been the same things as using the second step of the table from the beginning (this is actaully not always true when running out of empty slots):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>moreYellow = yellowVariant[original][1];
</code></pre></div></div>
<p>There is a similar table for the black shades as well, and this is where it gets complex. Say you use the spraycan to paint something yellow, then you blast a bomb near that area to tint the yellow paint black. This is were palette indices start running out quickly, because we need a black variant, not only for all the original materials, but also for each yellow variant of each unique material and the same of course applies the other way around.</p>
<p>The implementation in teardown uses a rather naive implementation that merges visually similar colors and simply stops adding new colors when the palette is full. (It actually prioritizes opaque colors and adds transparent colors only if there are available slots in the palette, since blending transparency doesn’t work well in the engine anyway).</p>
<p>The translation table is done last in a separate pass when all new colors have been added. It can choose freely from all available materials and pick the best match, so it is totally possible that one original material gets translated to another original material if it happens to be a yellow or black variant.</p>
<p>At some point I would like to improve the generation of new materials and use some kind of optimization algorithm to select the materials that generate the best translation table given the constraints, but it’s a non-trivial task that might be quite hard to pull off.</p>
<p><a href="/assets/2020-12-03-castle.png"><img src="/assets/2020-12-03-castle.png" alt="" /></a></p>
<p>If you want to see the limitations of the current approach, open the Castle example we ship in Create mode and bring out the spray can. The castle level is built as one huge scene in MagicaVoxel and therefore uses a single palette for the entire level. You’ll notice that some materials get a brown tint instead of yellow (probably because there already are a lot of brown shades in the palette) and a couple of materials that won’t even change at all, most likely because the palette filled up before reaching that index in the palette, forcing the best yellow variant of that color to simply become the original color itself.</p>Dennis GustafssonTeardown uses an 8-bit color palette for voxel materials, so any voxel volume can have up to 255 different materials and the representation per voxel is then just a single byte to save memory. A material specifies not only the color, but also things such as roughness, emissiveness, reflectivity and physical material type (wood, metal, foliage, etc). Each object can have a unique palette, but a lot of them share the same one. When something breaks, all the pieces inherit the original object palette, so the number of palettes do not increase over time. I pack all palettes in a texture that is 256 in width and number of materials in height and keep that on the GPU for rendering. This way of handling materials conflicts with one particular feature that I really wanted in the game - the spraycan.Teardown quicksave2020-11-18T00:00:00+01:002020-11-18T00:00:00+01:00https://blog.voxagon.se/2020/11/18/teardown-quicksave<p>Saving the complete state of a game at any time and then restoring to that state is hard for any game, but in a fully dynamic voxel world that constantly changes, controlled by dozens of lua scripts, all with their own internal state, implementing this was quite a challenge.</p>
<p>The quicksave feature in Teardown is central to gameplay and needs to be extremely robust for the game to be playable, so I knew early on this had to work flawlessly. It also had to be relatively fast. A long delay would be annoying and cause players to use it less often, limiting creativity and experimentation. Furthermore, it is one of those features that’s quite unrewarding to work on, because no matter how good it is, it doesn’t really add anything to the game, other than working as expected, while even the smallest error instantly results in corrupt state and most likely a crash.</p>
<p>Let’s start with the world itself. It consists of thousands of individual voxel volumes, each with anything from a couple of hundred up to millions of voxels. These volumes are altered dynamically as the player causes destruction. Both voxel content and the size of the volumes change, new volumes are being added and others are removed. In theory, it would probably be possible to keep a diff for the world and use that for tracking state, but for robustness purposes I wanted to save the entire state of world for each save. The larger levels contain roughly half a billion voxels, so I first thought it would be unrealistic to save all that state, but since the voxel data compresses very well it turned out to actaully be a viable option. Proper entropy coding like zlib can easily get the size down to a few percent of the original size, but compressing half a gigabyte of data takes a while even on a fast machine. Instead I’m using simple <a href="https://en.wikipedia.org/wiki/Run-length_encoding">run-length encoding</a> which has almost zero cost, both on the compressing and the decompressing end. Using this gets the size down to 15-20% of original size, making quicksave files on the larger levels around 80 Mb in size. It’s still a lot, but acceptable and very quick to load. The binary mission content files that we ship with the game are actually just an initial quicksave snapshot of each level, but I run these through zlib at the bake step, cutting size down to about 20 Mb per mission.</p>
<p>Compared to the voxel data, all other game state is tiny, but equally important. I’m using an explicit form of serialization, where each object has callbacks for saving and loading state. This means each object can choose freely what to save, and might leave out cached state or temporary acceleration structures. The voxel objects, for instance, has a separate physics representation, which is also voxel data, but in a different format. This is 100% reproducible from the main voxel data, so instead of saving it, it’s generated at load time. The same goes for spatial acceleration structures for physics, culling, rendering and lighting. I’m using a serialization context that gets passed around to each object that keeps track of pointer serialization by iterating over the objects twice at load time.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>//Code example for saving the state of a physical body
void Body::saveState(TSerializeOutput& ser)
{
Entity::saveState(ser);
ser.writeTransform(mTransform);
ser.writeVec3(mLinVel);
ser.writeVec3(mAngVel);
ser.writeBool(mDynamic);
ser.writeUInt8(mActive);
}
</code></pre></div></div>
<p>This explicit form of serialization is flexible, but it also makes it error prone. I tend to group class members in stateful and stateless sections in the header. This makes it a bit more explicit what needs to be saved and easier to check if everything is being saved, but I wish I had a better system in place to verify that all state is indeed saved.</p>
<p>Scripting was by far the hardest part to serialize. All gameplay logic in Teardown is implemented in lua scripts and since scripts can be written by anyone, even outside the development team (there is already an active modding community) I wanted state serialization for scripts to be fully automated. Hence, in contrast to the engine-side serialization, there should be no need for callbacks or explicit state serialization. One would have hoped that the lua library offered some way to serialize global state, but unfortunately there is not much in there to help (at least not in lua version 5.1 that I’m using, correct me if I’m wrong). Fortunately, traversing the global state of a lua context is relatively easy. All global variables show up in the globals table (_G), so if all handles and engine interaction is handled properly, serializing that table is enough. There is still a lot of non-trivial code to untangle table references and types correctly and the end result is not perfect, but it works for all our own scripts. There are corner cases, like multiple tables referencing other tables and circular dependencies that will not serialize correctly, so I’ll have to go over that at some point, but for the most part it works really well.</p>Dennis GustafssonSaving the complete state of a game at any time and then restoring to that state is hard for any game, but in a fully dynamic voxel world that constantly changes, controlled by dozens of lua scripts, all with their own internal state, implementing this was quite a challenge.Teardown design notes2020-11-05T00:00:00+01:002020-11-05T00:00:00+01:00https://blog.voxagon.se/2020/11/05/teardown-design-notes<p>Teardown started as a technology experiment and it’s one of those games where gameplay was designed to fit the technology, rather than the other way around. It’s not the first time I’ve been involved in such projects (Sprinkle, Smash Hit), and probably not the last, but Teardown was by far the most frustrating experience yet.</p>
<p>The idea of a fully destructible environment is compelling for the player but a nightmare for the game designer. Walls can no longer be used as obstacles, key objects that the player might need to complete an objective can break and the designer is no longer in control over a players path through the game, potentially breaking the intended progression. Not to mention all the technical hurdles a fully destructible environment implies when it comes to physics, lighting, scripting, etc, but more on that is a future blog post.</p>
<p>Destruction is often used in games as a decorative special effect, but for Teardown the intention was always to use destruction as the key element in gameplay and with a limited amount of action, allowing the player to do detailed precision work rather than total mayhem.</p>
<p><a href="/assets/2020-11-05-teardown.png"><img src="/assets/2020-11-05-teardown.png" alt="" /></a></p>
<p>After nearly a full year of experimentation and many failed prototypes, the idea of a two-phase heist setting was born. It’s compatible with all the limitations (or lack thereof) that a fully destructible environment impose, while still offering an interesting challenge. It allows the player to move around freely in a fully accessible environment, carefully planning the heist and creating shortcuts using destruction, vehicles and objects from the environment in a creative way. The player chooses when, and I think it’s important that this is the players decision, to go into action mode and try out the created path.</p>
<h1 id="level-design">Level design</h1>
<p>Allowing the player to destroy everything has a huge impact on level design. Since any wall can be torn down, the only true obstacles at our disposal are elevation, distance, water and unbreakable objects. We could use unbreakable objects more, but it would make the environment harder to read and imply a failure to deliver on the promise of a fully destructible environment. Therefore unbreakable objects are only used for rock formations and the ground you’re standing on.</p>
<p>The relatively small level size started as a technical limitation, but I don’t think the game would benefit from larger levels even if it was technically possible. Villa Gordon is currently the largest level in the game, and it can already be a bit tedious to walk around during the preparation phase. Personally I think the game shines in a more compact and cluttered environment like Hollowrock Island, with some verticality to allow for more interesting shortcuts.</p>
<p>The only place we found the level size to be a limitation was the end chase on Frustrum level. We originally anticipated it to be twice as long, but due to a 3D texture size limitation on AMD graphics cards, we had to restrict it to 400 meters. We could have made it twice as long using a U-shaped level, but we also wanted to keep the level straight to have the goal direction consistently aligned with the sun.</p>
<h1 id="the-timer-and-the-chopper">The timer and the chopper</h1>
<p>Nobody likes a timer, and in previous iterations of the game idea there was no timer. Since the game offers so much player freedom, the only viable option to impose any form of challenge would be resource limitations, and for a sandbox game where destruction plays a central role, adding restrictive resource limitations just doesn’t make the game fun. The goal with the alarm timer has always been to offer a challenge even with a generous amount of tools and resources. While I can agree that a timer is usually a bad idea in game design, I’m really happy the way it turned out in Teardown.</p>
<p>Along the way we’ve mixed up the timed missions with other types where the challenge comes more from moving heavy objects, demolishing buildings or putting out fires, but I’m not convinced that alone could support a whole game. In several missions there are alarmed targets attached to something heavy, allowing it to be moved around to some extent, which I think is a good mix, letting the player choose whether to tinker with the environment or just make a run for it.</p>
<p><a href="/assets/2020-11-05-chopper.png"><img src="/assets/2020-11-05-chopper.png" alt="" /></a></p>
<p>A popular suggestion has been to have the security chopper chase the player after arriving to the scene instead of instant failure, but as a general solution I don’t think it’s a good idea. It would introduce an element of randomness that would discourage the strategic thinking and careful planning that this game is all about, in favor of just replaying the mission until reaching the escape vehicle before dying. So instead we added a separate mission type that still allows the player to make preparations, but the chopper shows up shortly after clearing the first target, effectively replacing the timer with an enemy. I think both mission types work well, but that doesn’t necessarily mean it’s a good idea to combine them.</p>
<h1 id="trial-and-error">Trial and error</h1>
<p>Quicksave can be a sensitive topic in game design. For linear games it’s often a tough decision whether to offer quicksave at any time or save progression only at certain times or locations. Some players refuse to use a generous quicksave feature, as it could be considered cheating.</p>
<p>This is something I think turned out particularly well in Teardown - allowing just one save slot, freely available at any time during preparation, but disabled as soon as the alarm goes off. It encourages player experimentation during the preparation phase, but since there is only one slot, it must still be used wisely. Even with quicksave available, a major change of plans often requires a full restart anyway due to resource limitations, vehicle condition or broken objects.</p>
<p>Trying out a route and then go back to the planning phase for improvement is a key part of the core loop and so intrinsic to the game that we actually enforce it in the third mission to communicate that this is the intended way to play the game.</p>
<h1 id="replayability">Replayability</h1>
<p>Since any mission can be played in an infinte number of ways there is already natural incentive for replayability, but there are a couple of things in the game specifically designed to increase replayability. Most missions have optional targets that will increase the score. These optional targets are often placed in strategic locations that break up the most efficient path of the required targets, encouraging the player to use a different strategy and/or starting location.</p>
<p>New tools and upgrades introduced later in the game make all earlier missions easier to complete. It gives a natural incentive to go back and replay missions with better tools, clearing more optional targets, which increases score and gives even better tools, forming an outer game loop that can be quite rewarding. Admittedly, for this to have a strong impact on the game, there would need to be more optional targets. However, introducing a lot of optional targets early in the game can be quite overwhelming, so the whole thing might need to be redisigned a bit to work as intended.</p>
<h1 id="story">Story</h1>
<p>Let’s be honest - no one plays Teardown for the story, but I think it serves an important role to frame the missions and as an incentive for progression. It was an early decision to deliver the story in the form of one way e-mail communication and I’m quite happy the way it turned out. Since the player can go back and read old e-mails, it’s possible to catch up on the story when coming back after taking a break from the game. This is something I miss in a lot of other games - the ability to recap the story when coming back to them.</p>
<p>The reason e-mails cannot be replied to is part of the bigger goal of making the player fully anonymous. The main character in Teardown is intentionally lacking name, age, gender and personality traits to fully leave that up the players imagination.</p>
<p><a href="/assets/2020-11-05-mail.png"><img src="/assets/2020-11-05-mail.png" alt="" /></a></p>
<p>The story is also told through the environments, how they progress, descriptions of objects in them, themed valuables and last but not least the television. There’s a lot of room for improvements here. I originally envisioned much more environmental changes when coming back to the same environment (also involving procedural changes based on the players actions) but for several reasons we had to cut back on that.</p>
<p>Missions are kept separate from the e-mails on the Missions tab to give the player an overview of available missions for a particular location. This is to further incentivice replayability and make it clear where improvements are possible to increase score and rank.</p>
<h1 id="progression">Progression</h1>
<p>Whether to have sandbox levels directly accessible or tied to campaign progression has been a long internal discussion. Knowing that a lot of people would want to play Teardown just for the sandbox experience, it may seem a bit inconsiderate to enforce a complete playthrough to make everything accessible. On the other hand, keeping all environments and tools available in sandbox mode from the beginning would ruin the experience for campaign players.</p>
<p>The route we chose was to keep them locked, but introduce new environments and tools relatively early in the campaign. The first three environments can be unlocked after completing just five missions while the fourth one requires a bit more work. While not suiting everyone, I think it turned out quite well, and I hope more people play and enjoy the campaign because of this decision.</p>
<p>Tool upgrades also carry over to the sandbox mode, which gives a stronger incentive to scavenge valuables and upgrade tools in the campaign.</p>Dennis GustafssonTeardown started as a technology experiment and it’s one of those games where gameplay was designed to fit the technology, rather than the other way around. It’s not the first time I’ve been involved in such projects (Sprinkle, Smash Hit), and probably not the last, but Teardown was by far the most frustrating experience yet.The importance of good noise2018-12-07T00:00:00+01:002018-12-07T00:00:00+01:00https://blog.voxagon.se/2018/12/07/the-importance-of-good-noise<p>There are many articles to read about noise functions in computer graphics, especially now that a lot of people recently got interested in ray tracing, but it took me a long time to fully understood <em>why</em> noise characteristics are so important and I didn’t find a good resource on the Internet explaining it, so I’ll give it a shot.</p>
<h1 id="why-noise-is-needed">Why noise is needed</h1>
<p>Noise is used to generate sequences of semi-random numbers. I use these random numbers at several places in the rendering system, but here are a few examples:</p>
<ul>
<li>
<p>Soft shadows from light sources that are not a single point. In this case, the light source is a sphere, so for each ray I trace towards a random point on that sphere.</p>
</li>
<li>
<p>Blurry reflections. For materials that are not perfect mirrors, I alter the surface normal a small amount for each ray. This gives the appearance of a rough surface.</p>
</li>
<li>
<p>Ambient occlusion, which darkens concave areas which are blocked from the incoming environment light. I shoot a number of rays on the hemi-sphere for each surface point and randomize the direction.</p>
</li>
<li>
<p>Volumetric fog, or god rays. In order to approximate lit fog I shoot rays along the line of sight towards each light source. Both the sample points along the line of sight and the direction towards the light source (if it’s not a point light) need noise.</p>
</li>
</ul>
<p>I use noise in other places as well, but these are probably the easiest to explain. In some cases, several rays are shot for each pixel and the result is just the average of all samples. In other cases, there is just a single sample. In either cases, the visual result will be more or less noisy. I use denoising by spatially blurring and temporally accumulating the result over time.</p>
<h1 id="noise-characteristics">Noise characteristics</h1>
<p>In white noise, each sample is just a random number, without any consideration of the sequence as a whole, very much like rolling a die. Imagine the following sequence of random numbers between 0 and 9:</p>
<p>2 9 7 1 3 5 6 1 0 1 8 9 2 4 4</p>
<p>Since each sample has no “memory” of what has already been generated, it can generate similar numbers several times in a row, like the “1 0 1” found in the middle.</p>
<p>Now instead consider the following sequence, which are the exact same numbers but in a different order:</p>
<p>2 9 3 7 1 5 1 4 0 8 4 1 9 2 6</p>
<p>Since it’s the same numbers, they have the same average, but I swizzled them around so that adjacent numbers are always reasonably far apart. You can think of this as a signal with higher frequency. This is roughly what blue noise is trying to achieve, and that’s why noise functions are so intimately related to frequency spectrums.</p>
<h1 id="why-blue-noise-is-desirable">Why blue noise is desirable</h1>
<p>When working with computer graphics, blue noise is desirable, because we don’t want the same (or similar) result in two adjacent pixels on the screen, because that will make the spatial denoising filter less efficient.</p>
<p>Even without filtering, blue noise gives a smoother characteristic and more visually pleasing image, but that’s because there is a certain amount of filtering going on in our eyes and brains. Apparently, our retinal cells are arranged in a blue noise-like pattern. Pretty cool!</p>
<p><a href="/assets/2018-12-07-the-importance-of-good-noise-1.png"><img src="/assets/2018-12-07-the-importance-of-good-noise-1.png" alt="" /></a></p>
<p>Unfiltered ambient occlusion with white noise</p>
<p><a href="/assets/2018-12-07-the-importance-of-good-noise-2.png"><img src="/assets/2018-12-07-the-importance-of-good-noise-2.png" alt="" /></a></p>
<p>Same ambient occlusion with blue noise</p>
<p>To put it in simpler words, we want semi-random numbers that are as “spread out” as possible both vertically and horizontally. In one dimension, like the numbers above, it’s fairly easy, but when doing it in two dimensions it’s actually much harder, because you need to consider not only what’s before and and after each sample, but also what’s above and below. This is what two-dimensional blue noise does.</p>
<h1 id="temporal-aspect">Temporal aspect</h1>
<p>So far we didn’t consider time. Now imagine we have a good distribution of random numbers so that the number for a specific pixel is not similar to any of it’s neighbors. Let’s also add time.</p>
<p>Just like with pixel neighbors, we don’t want the random number for a specific pixel to be the same for two consecutive frames, because that will make the temporal filtering less efficient. You can think of this as just another dimension, so what we want is really three dimensional noise that is spread out both horizontally, vertically <em>and over time</em>.</p>
<p>This is where is gets complicated, because apparantly, when generating blue noise in three dimensions it loses some of it’s nice properties in two dimension, so a 2D slice of 3D blue noise will <em>not</em> be as good as pure 2D blue noise. I’m not good enough at math to fully understand why, but there is an in-depth article about it <a href="http://momentsingraphics.de/?p=148">here</a></p>
<p>To overcome this issue, I use a trick based on the golden ratio, which is the most irrational number there is. The golden ratio is very useful for many things, and if you haven’t added that to your bag of tricks, you should. There is a really cool video explaining why it is so irrational <a href="https://www.youtube.com/watch?v=sj8Sg8qnjOg">here</a></p>
<p>Irrational numbers in general and the golden ratio in particular has this property that if you add it to a number between zero and and one take the fraction of that, you will get a new number that is far apart from the old one, yet never repeating itself when you do it over and over. This is exactly what we want! So, instead of using 3D blue noise, I use 2D blue noise and animate it using the golden ratio. Note that this is not a novel idea, a lot of people have been doing this before, for instance <a href="https://blog.demofox.org/2017/10/31/animating-noise-for-integration-over-time/">here</a></p>
<h1 id="how-to-use-it">How to use it</h1>
<p>The most efficient way of using blue noise is probably to use a precomputed texture with blue noise samples in it. It doesn’t have to be the same resolution as the render target. It’s fine to repeat it a few times before it gets visually noticable. I use a 512x512 sized blue noise texture. In order to achieve all the good blue noise properties it is important to line up the blue noise texture <em>exactly</em> with the fragments in the render target and not using any filtering when sampling it. You need a pixel-perfect mapping for it to work effiently.</p>
<p>To animate the noise over time, just do <code class="highlighter-rouge">noise=mod(blueNoise+GOLDEN_RATIO*frameNumber, 1.0)</code> and you’re done. Make sure that frameNumber doesn’t get too big over time, or you’ll lose floating point precision, so <code class="highlighter-rouge">noise=mod(blueNoise+GOLDEN_RATIO*(frameNumber%100), 1.0)</code> or similar will work fine.</p>
<p><strong>Update:</strong> There has been a discussion on <a href="https://twitter.com/tuxedolabs/status/1070987893970223104">twitter</a> where <a href="https://twitter.com/TastyTexel">@tastytexel</a> pointed out that as the blue noise sample wraps around it can introduce low frequency components. Imagine two consecutive samples that are 0.0 and 0.9 (good distribution). When adding the same offset 0.1 to both with fraction wrapping, the result will be 0.1 and 0.0 (bad distribution). The suggested fix is to reshape the noise through a triangular filter so that it wraps around nicely. Will update with more findings as I have tried this.</p>
<h1 id="more-dimensions">More dimensions</h1>
<p>In many cases you want not just a single random number, but a 2D or 3D random vector. I use an RGB blue noise texture for that, so each pixel actually has three random numbers, where each channel has blue noise properties. You could just add the golden ratio to animate those, but I came to think of this <a href="http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/">blog post</a> about the R2 sequence, which is a multi-dimensional generalization of the golden ratio.</p>
<p>So instead of adding the golden ration to each component when animating 3D noise, I add three <em>different</em> irrational numbers, one to each channel. In my experience this gives a better distribution of 2D and 3D vectors over time, since all components aren’t shifting the same amount.</p>
<p>Finally <a href="http://momentsingraphics.de/?p=127">here</a> is a link to a very useful <a href="http://momentsingraphics.de/Media/BlueNoise/FreeBlueNoiseTextures.zip">database</a> with free, precomputed blue noise textures of different sizes.</p>Dennis GustafssonThere are many articles to read about noise functions in computer graphics, especially now that a lot of people recently got interested in ray tracing, but it took me a long time to fully understood why noise characteristics are so important and I didn’t find a good resource on the Internet explaining it, so I’ll give it a shot.From screen space to voxel space2018-10-17T00:00:00+02:002018-10-17T00:00:00+02:00https://blog.voxagon.se/2018/10/17/from-screen-space-to-voxel-space<p>There has been quite a few changes to my rendering pipeline over the last couple of months. The biggest being that I now do full raytracing in voxel space instead of the screen space counterpart.</p>
<p>This may sound like a major rewrite, but a lot of the pipeline actually stays the same. I simply trace rays in a huge 3D texture instead of screen space. This obviously has a number of benefits, like real ambient occlusion, long shadows, specular occlusion and no screens space artefacts, but it also comes with a number of drawbacks. The biggest one probably being memory consumption. I chose a texture resolution of 5 cm and combined with a world size of 100x100x25 meters this gives two billion voxels, or two gigabytes if storing one voxel per byte.</p>
<p>Since each object has it’s own transform and can be freely moved and rotated, I have to rasterize each object into the big world texture continuously. This is done on the cpu and the relevant parts of the texture is updated with <code class="highlighter-rouge">glTexSubImage3D</code>. For large objects, this can be rather slow, so the technique is not for everyone. I’ve been surprised by how well it works in practice though, since most dynamic objects are usually rather small. If there are several objects moving at the same time, I cluster them and send updates in larger chunks where they are needed.</p>
<h1 id="voxel-storage">Voxel storage</h1>
<p>Note that the big world space texture, the <em>shadow texture</em>, only requires one bit per voxel. Using a whole byte means wasting eight times the memory we really need. Therefore I store eight neighboring voxels per byte in an octree fashion, so each bit represents one octant in a 10 cm cube. If the byte is zero is means there are no voxels in any octant, and this can be exploited later to speed up raytracing. In addition to the base level 10 cm resolution (or 5 cm if you count the octants) I also store two mip levels, one for 20 cm and one for 40 cm. This gives a total of four mip levels, including the octant bits. This gives 256 + 32 + 4 = 292 Mb for the shadow texture instead of the two gigabytes, including two mip levels which can be used to speed up raytracing.</p>
<h1 id="raytracing-the-shadow-texture">Raytracing the shadow texture</h1>
<p>Raytracing in voxel space is actually much simpler than in screen space. Just start at the camera and walk the ray direction in voxels space until hitting something that isn’t zero. Note that walking the voxel space the same direction in fixed steps will <strong>not</strong> give a water tight result. Light can leak through voxels in certain scenarios as described in <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.3443&rep=rep1&type=pdf">this paper</a>, making implementation a little trickier. The paper also presents a solution to the problem, an algorithm sometimes referred to as “supercover” traversal.</p>
<p>I actually use both supercover and the cheaper fixed step tracing depending on the use case. Ambient occlusion, for instance, or volumetric fog, doesn’t require exact tracing. This is were voxel raytracing really shines. In triangle raytracing each ray has a fixed cost, while in voxel raytracing you can choose which fidelity you need for each particular ray and even change fidelity while walking the ray.</p>
<p>To further speed up raytracing I also utilize the other mip levels, starting in the lowest mip level and if it hits, switch to the larger ones progressively. The base mip level requires some bit masking to find out if it really hits, but the general algorithm is the same.</p>
<p>Here is test scene without shadows and ambient occlusion. It’s a bit unfortunate that these are voxel objects themselves. It’s not really necessary, since the shadow volume rasterization would work on any closed mesh, but I don’t have it in my code yet, since I’m currently working on a game with voxel graphics.</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-1.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-1.png" alt="" /></a></p>
<p>Let’s add ambient occlusion with five rays per pixel in the most naive way possible. Just walk from the pixel in a semi-random direction on the hemisphere.</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-2.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-2.png" alt="" /></a></p>
<p>As seen in the image, this won’t work very well, bacause the shadow volume is not perfectly aligned with the object surface. You can think of this as an extreme version of <a href="https://digitalrune.github.io/DigitalRune-Documentation/html/3f4d959e-9c98-4a97-8d85-7a73c26145d7.htm">shadow acne</a> found in regular shadow mapping. To overcome this problem, I don’t start tracing at the pixel position, but offset the ray origin a safe distance, based on the normal and ray direction. This will prevent the ray from hitting the shadow voxel that comes from the pixel surface.</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-3.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-3.png" alt="" /></a></p>
<p>While capturing the overall occlusion very well, you might notice that it lacks fine detail. This is an artefact from the ray origin offset.</p>
<h1 id="combining-with-screen-space">Combining with screen space</h1>
<p>To capture finer detail, we need a backup method for the distance from the pixel out to the new ray origin. Fortuantely there is another technique that works really well at short distances – screen space raytracing! It turns out all my previous work in screen space raytracing now come in handy. Here is the contribution from a short distance of screen space raytracing:</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-4.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-4.png" alt="" /></a></p>
<p>And finally combining the two by starting in screen space, trace the safe distance and unless it hits something switching over to voxel raytracing and continue through the world:</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-5.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-5.png" alt="" /></a></p>
<h1 id="light-sources">Light sources</h1>
<p>The exact same techinque can also be used for light sources. Here is the scene with a spotlight with zero radius. One ray per pixel is shot from the pixel towards the light source, starting in screen space and then moving over to voxel space.</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-6.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-6.png" alt="" /></a></p>
<p>As you might notice, the voxel grid becomes pretty noticable with sharp shadows and particularly so with light coming in at a sharp angle. However, since we are now raytracing the shadows, it’s really easy to make soft shadows by just jittering the light position. This will effectively hide artefacts from the voxel grid, at the same time producing accuracte soft shadows “for free”. Here is the same light but with a 30 cm radius:</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-7.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-7.png" alt="" /></a></p>
<p>I also use the same raytracing technique for reflections and volumetric fog (god rays). For reflections, I use screen space reflections for the reflected image where available. Where it’s not available I fade to black, simply because I don’t have any information about the hit surface. The shadow volume is binary and knows nothing about materials. However, and more importantly, since I do the ray tracing in voxel space, I get <em>specular occlusion</em> for everything, not just what’s visible on screen. Blurry reflections generally look better for the same reason that soft shadows look better than sharp – because they hide blocky artefacts better.</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-8.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-8.png" alt="" /></a></p>
<h1 id="performance">Performance</h1>
<p>Voxel raytracing performance depends largely on the length of the ray, the resolution and the desired quality (step length, mip map, etc), but it is generally really, <em>really</em> fast compared to polygon ray tracing. I don’t have any exact measures of how many rays per pixel I shoot, but for comparison I do all ambient occlusion, lighting, fog and reflections in this scene in about 9 ms, including denoising. The resolution is full HD and the scene contains about ten light sources, all with volumetric fog, soft shadows and no precomputed lighting. Timings taken on a GTX 1080.</p>
<p><a href="/assets/2018-10-17-from-screen-space-to-voxel-space-9.png"><img src="/assets/2018-10-17-from-screen-space-to-voxel-space-9.png" alt="" /></a></p>Dennis GustafssonThere has been quite a few changes to my rendering pipeline over the last couple of months. The biggest being that I now do full raytracing in voxel space instead of the screen space counterpart.Undo for lazy programmers2018-07-10T23:29:00+02:002018-07-10T23:29:00+02:00https://blog.voxagon.se/2018/07/10/undo-for-lazy-programmers<p>I often see people recommend the command pattern for implementing undo/redo in, say, a level editor. While it sure works, it’s <em>a lot</em> of code and <em>a lot</em> of work. Some ten years ago I came across an idea that I have used ever since, that is super easy to implement and has worked like a charm for all my projects so far.</p>
<!--more-->
<p>Every level editor already has the functionality to serialize the level state (and save it to disk). It also has the ability to load a previously saved state, and the idea is to simply use those to implement undo/redo. I create a stack of memory buffers and serialize the entire level into that after each action is completed. Undo is implemented by walking one step up the stack and load that state. Redo is implemented in the same way by walking a step down the stack and load.</p>
<p>This obviously doesn’t work for something like photoshop unless you have terabytes of memory laying around, but in my experience the level information is usually relatively compact and serializes fast, since heavy objects like textures and models are only referenced. If you are worried about memory consumption, you can also just save each serialized state to a temporary folder on disk instead.</p>
<p>The one problem I came across with this approach is that editor specific state, like which object is selected might be forgotten after undo if you use pointers, but I have solved that by handling selection with object id’s rather than pointers.</p>Dennis GustafssonI often see people recommend the command pattern for implementing undo/redo in, say, a level editor. While it sure works, it’s a lot of code and a lot of work. Some ten years ago I came across an idea that I have used ever since, that is super easy to implement and has worked like a charm for all my projects so far.Bokeh depth of field in a single pass2018-05-04T11:38:00+02:002018-05-04T11:38:00+02:00https://blog.voxagon.se/2018/05/04/bokeh-depth-of-field-in-single-pass<p>When I implemented bokeh depth of field I stumbled upon a neat blending trick almost by accident. In my opinion, the quality of depth of field is more related to how objects of different depths blend together, rather than the blur itself. Sure, bokeh is nicer than gaussian, but if the blending is off the whole thing falls flat. There seems to be many different approaches to this out there, most of them requiring multiple passes and sometimes separation of what’s behind and in front of the focal plane. I experimented a bit and stumbled upon a nice trick, almost by accident.</p>
<p>I’m not going to get into technical details about lenses, circle of confusion, etc. It has been described very well many times before, so I’m just going to assume you know the basics. I can try to summarize what we want to do in one sentence – render each pixel as a discs where the radius is determined by how out of focus it is, also taking depth into consideration “somehow”.</p>
<p>Taking depth into consideration is the hard part. Before we even start thinking about how that can be done, let’s remind ourselves that there is no correct solution to this problem. We simply do not have enough information. Real depth of field needs to know what’s behind objects. In a post processing pass we don’t know that, so whatever we come up with will be an approximation.</p>
<p>I use a technique sometimes referred to as <em>scatter-as-gather</em>. I like that term, because it’s very descriptive. GPU’s are excellent at reading data from multiple inputs, but hopelessly miserable at writing to multiple outputs. So instead of writing out a disc for each pixel, which every sane programmer would do in a classic programming model, we have to do the whole operation backwards and compute the sum of all contributing discs surrounding a pixel instead. This requires us to look as far out as there can be any contributions and the safest such distance is the maximum disc radius. Hence, every pixel becomes the worst case. Sounds expensive? It is! There are ways to optimize it, but more on that later. Here is the test scene before and after blur with a fixed disc size for all pixels:</p>
<p><a href="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-1.png"><img src="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-1.png" alt="" /></a></p>
<p>Now to the hard part. When gathering discs, each one of them have different sizes. That in itself it not very hard, just compare the distance to the disc radius. If it’s smaller it contributes, otherwise not. The hard part is how it should contribute and when. Taking a stab at a “correct” solution, I compute how much a sample from each disc contributes. A larger disc means that the pixel intensity will get spread out over a larger area, so each individual sample gives a smaller contribution. Doing the math right, summing up everything in the end should yield the same overall intensity in the image. This is what it looks like:</p>
<p><a href="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-2.png"><img src="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-2.png" alt="" /></a></p>
<p>It’s not totally wrong, but it’s definitely not right. Foreground and background objects do get blurred, but foreground objects get darkened around edges. You might also notice bright streaks across the focus areas. The streaks are artefacts from working with discrete samples in something that should be continuous. It’s hard to do the correct thing when the disc size approaches zero, since you have to take at least one sample. What about the dark edges? If we did this the right way, we would actually blend in each disc, allowing some of the background to shine through. Since that information is occluded this is what we get.</p>
<p>Instead of trying to compute the correct intensity from each disc sample, let’s just sum them up and remember the total, then divide with the total afterwards. This is pretty classic conditional blur and always gives the correct intensity:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for each sample
if sample contributes then
color += sample * contribution
total += contribution
end
end
color /= total
</code></pre></div></div>
<p><a href="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-3.png"><img src="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-3.png" alt="" /></a></p>
<p>Definitely better. Both the streaks and the dark corners are gone, but foreground objects don’t blur correctly over a sharp background. This is because the contribution from in-focus objects have a stronger weight than the large discs in the foreground. There are several ways to attack this issue. One that I found to look relatively good is to compute the contribution using the distance to the sample instead of the disc size. This is far from correct, but gives a nicer fall-off:</p>
<p><a href="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-4.png"><img src="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-4.png" alt="" /></a></p>
<p>Now to the trick. Instead of computing a contribution based on disc size or distance, I use a fixed weight for all contributing samples no matter the distance or disc size, and blend in the current average if there is no contribution:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>color = centerColor
total = 1.0
for each sample
if sample contributes then
color += sample
else
color += color/total
end
total += 1.0
nextcolor /= total
</code></pre></div></div>
<p>Huh? What this does is basically to give each contributing sample equal weight, but when there is no contribution, the current average is allowed to grow stronger and get more impact on the result. This is what it looks like:</p>
<p><a href="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-5.png"><img src="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-5.png" alt="" /></a></p>
<p>Not bad. The blur is continuous and nice for both foreground and background objects. Note that this only works when sampling in a spiral, starting at the center pixel being shaded, because the first sample needs to be the pixel itself. I’m not going to try and explain details on why this algorithm works, simply because I’m not sure. I found it by accident, and I have spent days trying other methods but nothing works as well as this one.</p>
<p>There is one small thing to add before this is usable. The background shouldn’t blur over foreground object that are in focus. Note though that if the foreground is out of focus, we want some of the background to blur in with it. What I do is to clamp the background circle of confusion (disc size) between zero and the circle of confusion of the center pixel. This means that the background can contribute only up to the amount of blurryness of the pixel being shaded. The scatter-as-gather way of thinking requires a lot of strong coffee.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if sample depth > center depth then
sample size = clamp(sample size, 0, center size)
end
</code></pre></div></div>
<p>Here is what the final image looks like:</p>
<p><a href="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-6.png"><img src="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-6.png" alt="" /></a></p>
<p><a href="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-7.png"><img src="/assets/2018-05-04-bokeh-depth-of-field-in-single-pass-7.png" alt="" /></a></p>
<p>A couple of notes on my implementation. After some experimentation I changed the clamping of sample size to <code class="highlighter-rouge">clamp(sample size, 0, center size*2.0)</code>. For larger max radius I increase it even more. This controls how much of the background gets blended into a blurry foreground. It is totally unphysical and is only there to approximate the occluded information behind the foreground object.<br /><br />The following code is written for clarity rather than speed. My actual implementation is in two passes, calculating the circle of confusion for each pixel and stores in the alpha component, at the same time downscaling the image to quarter resolution (half width, half height). When computing the depth of field I also track the average sample size and store in the alpha channel, then use this to determine wether the original or the the downscaled blurred version should be used when compositing. More on performance optimizations in a future post.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uniform sampler2D uTexture; //Image to be processed
uniform sampler2D uDepth; //Linear depth, where 1.0 == far plane
uniform vec2 uPixelSize; //The size of a pixel: vec2(1.0/width, 1.0/height)
uniform float uFar; // Far plane
const float GOLDEN_ANGLE = 2.39996323;
const float MAX_BLUR_SIZE = 20.0;
const float RAD_SCALE = 0.5; // Smaller = nicer blur, larger = faster
float getBlurSize(float depth, float focusPoint, float focusScale)
{
float coc = clamp((1.0 / focusPoint - 1.0 / depth)*focusScale, -1.0, 1.0);
return abs(coc) * MAX_BLUR_SIZE;
}
vec3 depthOfField(vec2 texCoord, float focusPoint, float focusScale)
{
float centerDepth = texture(uDepth, texCoord).r * uFar;
float centerSize = getBlurSize(centerDepth, focusPoint, focusScale);
vec3 color = texture(uTexture, vTexCoord).rgb;
float tot = 1.0;
float radius = RAD_SCALE;
for (float ang = 0.0; radius<MAX_BLUR_SIZE; ang += GOLDEN_ANGLE)
{
vec2 tc = texCoord + vec2(cos(ang), sin(ang)) * uPixelSize * radius;
vec3 sampleColor = texture(uTexture, tc).rgb;
float sampleDepth = texture(uDepth, tc).r * uFar;
float sampleSize = getBlurSize(sampleDepth, focusPoint, focusScale);
if (sampleDepth > centerDepth)
sampleSize = clamp(sampleSize, 0.0, centerSize*2.0);
float m = smoothstep(radius-0.5, radius+0.5, sampleSize);
color += mix(color/tot, sampleColor, m);
tot += 1.0; radius += RAD_SCALE/radius;
}
return color /= tot;
}
</code></pre></div></div>Dennis GustafssonWhen I implemented bokeh depth of field I stumbled upon a neat blending trick almost by accident. In my opinion, the quality of depth of field is more related to how objects of different depths blend together, rather than the blur itself. Sure, bokeh is nicer than gaussian, but if the blending is off the whole thing falls flat. There seems to be many different approaches to this out there, most of them requiring multiple passes and sometimes separation of what’s behind and in front of the focal plane. I experimented a bit and stumbled upon a nice trick, almost by accident.Stratified sampling2018-04-24T08:22:00+02:002018-04-24T08:22:00+02:00https://blog.voxagon.se/2018/04/24/stratified-sampling<p>After finishing my framework overhaul I’m now back on hybrid rendering and screen space raytracing. My first plan was to just port the old renderer to the new framework but I ended up rewriting all of it instead, finally trying out a few things that has been on my mind for a while.</p>
<p>I’ve been wanting to try stratified sampling for a long time as a way to reduce noise in the diffuse light. The idea is to sample the hemisphere within a certain set of fixed strata instead of completely random to give a more uniform distribution. The direction within each stratum is still random, so it would still cover the whole hemisphere and converge to the same result, just in a slightly more predictable way. I won’t go into more detail, but full explanation is all over the Internet, for instance <a href="https://blog.yiningkarlli.com/2013/03/stratified-versus-uniform-sampling.html">here</a>.</p>
<p>Let’s look at the difference between stratified and uniform sampling. To make a fair comparison there is no lighting in these images, just ambient occlusion and an emissive object.</p>
<p><a href="/assets/2018-04-23-stratified-sampling-1.png"><img src="/assets/2018-04-23-stratified-sampling-1.png" alt="" /></a></p>
<p>They may look similar at first, but when zooming in a little one can easily see that the noise in the stratified version is quite different. Less chaotic and more predictable. The light bleeding onto the sphere here is a good example. In the stratified version, the orange pixels are more evenly distributed.</p>
<p><a href="/assets/2018-04-23-stratified-sampling-2.png"><img src="/assets/2018-04-23-stratified-sampling-2.png" alt="" /></a></p>
<p>For environment lighting, I use fixed, precomputed light per stratum, sampled from a low resolution version of the environment cube map. The strata are fixed in world space and shared for all fragments. More accurately, my scene is lit with a number of uniformly distributed area lights. The reason I want these lights fixed in world space is because the position and area of each light can be adapted to the environment. An overcast sky might for instance have a uniform distribution of lights around the upper hemisphere, while a clear sky will have one very narrow and powerful stratum directed towards the sun and a number of less powerful wider ones at an even distribution. The way I represent environment lighting has therefore changed from a cubemap to an array of the following:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>struct Light
{
vec3 direction; // Normalized direction towards center of light
vec3 perp0; // Vector from center of light to one edge
vec3 perp1; // Vector from center of light to other edge
vec3 rgb; // Color and intensity
};
</code></pre></div></div>
<p>Each pixel shoots one ray towards every light in the direction of <code class="highlighter-rouge">direction + perp0*a + perp1*b</code>, where <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code> are random numbers from -1 to 1. If the ray is a miss, the light contributes to that pixel’s lighting. If it’s a hit, I use radiance from the hit point, using a downscaled and reprojected version of the lighting from previous frame.</p>
<p>The key to reducing noise here is that each pixel is getting exactly the same incoming light every frame, so an unshadowed surface will always get the same color. Here is an example using 16 of these area lights.</p>
<p><a href="/assets/2018-04-23-stratified-sampling-3.png"><img src="/assets/2018-04-23-stratified-sampling-3.png" alt="" /></a></p>
<p><a href="https://computergraphics.stackexchange.com/questions/1666/what-is-the-difference-between-importance-sampling-and-mutiple-importance-sampli">Importance sampling</a> is another popular method for reducing noise, but that requires a unique distribution of rays per pixel. Since my area lights are fixed in world space that isn’t really an option. But, one thing I <em>can</em> change is the quality of each ray. Since this is raymarching, rather than raytracing, lower probability samples (those that deviate a lot from the surface normal) can be of lower quality (smaller step count) without affecting the result very much. This makes a huge difference for performance with very little visual difference. I’m now marching a ray between four and sixteen steps depending on the probability, almost cutting the diffuse lighting time in half.</p>
<p>While I’m at it, the stepping in my raymarching is done in world space, not in screen space like you would typically do for screen space reflections. The reason for this is that each ray is sampled very sparsely with incremental step size. I start with a small step size and increase every iteration in a geometric series. This allows for fine detail near geometry and contact shadows, while still catching larger obstacles further away at a reasonable speed. World space stepping also gives a more consistent (well, as far as screen space methods goes…) result as the camera is moving.</p>
<p><a href="/assets/2018-04-23-stratified-sampling-4.png"><img src="/assets/2018-04-23-stratified-sampling-4.png" alt="" /></a></p>
<p>Since lighting in the stratified version is more evenly distributed it is also easier to denoise. I still use a combination of spatial and temporal filters, but I’ve abandoned the smoothing group id and now using depth and normals again. When blurring, I use a perspective correct depth comparison, meaning that when the depth of two pixels are compared, one is projected onto the plane formed by the other one and the corresponding normal. Doing that is quite expensive, but since the stratified sampling looks good already when blurred with a 4x4 kernel I found it to be worth the effort.</p>
<p><a href="/assets/2018-04-23-stratified-sampling-5.png"><img src="/assets/2018-04-23-stratified-sampling-5.png" alt="" /></a></p>
<p>Reflections were rewritten to do stepping in screen space. I feel like there is an opportunity to use stratified sampling also for rough reflections, but I haven’t tried it yet. As a side note, materials with higher roughness shoot rays with larger step size. There is actually a lot more to say about denoising reflections (especially rough ones) and hierarchical raymarching, but I’ll stop here and might come back to that in another post. If there is an area you would like to hear more detail about, don’t hesitate to contact me on twitter: <a href="https://www.twitter.com/tuxedolabs">@tuxedolabs</a></p>Dennis GustafssonAfter finishing my framework overhaul I’m now back on hybrid rendering and screen space raytracing. My first plan was to just port the old renderer to the new framework but I ended up rewriting all of it instead, finally trying out a few things that has been on my mind for a while.