Thread-Safe Random Access to Zip Archives

Many games choose to store their resources in packages instead of shipping the potentially thousands of individual files directly. This is sometimes an attempt at tamper-proofing, but mostly it is about performance. Try copying a thousand 1 KiB files from one drive to another, then copy a single 1 MiB file on the same way – the former operation will take many times longer.

A good choice for a package format is the well known .zip archive. It’s not necessarily well-designed, but it decompresses fast and you definitely won’t have any problems finding tools to create, modify and extract .zip archives. Thus, when I started work on a file system abstraction layer for my future games, seamless .zip support was one of my main goals (I may also add 7-Zip at a later time just for fun).

Here is the design I came up with after exploring the file system APIs available on Windows, WinRT and Linux:

UML diagram showing the design of my file system abstraction layer

You may notice some rather radical design choices in my File class: there are no Open() or Close() methods and there is no Seek() method, either – each read or write specifies the absolute position without requiring the file to be opened.

File Manager Design

The decision to leave out Open() and Close() had many motivations:

It is a source of errors: sometimes you can use RAII, sometimes a file needs to stay open for a long time, which is when you forget to close it.
When dealing with files inside .zip archives, “opening” and “closing” them is an artifical concept, a handle would be completely pointless.
It adds management overhead to code using it and bloat to the interface providing it.
You can no longer read data from a const File since you need a state-changing operation (Open()) to do so.

Similarly, the Seek() operation had many things speaking against it:

It adds state to the File – now if multiple threads want to access the file at the same time, they need to synchronize their accesses because the file cursor can only be at one location at a time.
On systems that support file access without file cursors (eg. Linux pread() and pwrite()), it still needs to use or at least emulate the less efficient file cursor.
You can no longer read data from a const File since you need state-changing operations (Seek(), Read()) to do so.

Design Challenges

Of course, this design set up some challenges for my .zip archive reading code: it now had to support random access (which is a problem in any case since jumping around in compressed data isn’t easily possible – the extractor is always state bound and forward-only) and accesses from multiple threads (which requires multiple extractors since ZLib’s extractors can only be used from one thread at a time).

To ensure I came up with the most efficient solution possible, I wrote down the common file usage patterns I expected from a game:

Use case 1: Game wants the whole file decompressed in one go because it needs an in-memory copy of it (scripts, height maps, etc.).
Use case 2: Game is reading small bits from a file incrementally (decoding a structured file such as a BSP tree or a model).
Use case 3: Game is re-reading the beginning of the file over and over, then suddenly reads all of it (chain of responsibility used to search for suitable file decoder, eg. libpng, then OpenJPEG, then LibTiff).
Use case 4: Game is skipping large parts of the file (terrain paging, nested archives, different models for large scale LOD).

For use case 1, buffering would be a waste of time: the ZLib extractor would decode into the buffer, then the buffer would be copied over into the memory provided by the user. It would be much more efficient if the ZLib extractor decoded directly into the memory provided by the user.

For use cases 2 and 3, read-ahead and buffering is vital since otherwise, we would fire up ZLib’s extractor again and again, decompressing the same chunk of data over and over.

For use case 4, a fast way to decompress without actually using the data needs to be found.

Cache System

After some thinking, I found a great solution that provides optimal efficiency on all of these use cases:

I designed a generic Cache class that manages a number of "cache slots". Each slot stores a buffer, the absolute offset of the data the buffer holds and some implementation-defined things, like a ZLib extractor instance for example.

When a new read request comes in, the cache searches the slots for the closest slot on or before the requested location in the file. This slot is then assigned to the thread exclusively for the duration of the read operation, allowing the cache to serve other requests while avoiding multiple threads accessing the same ZLib extractor:

Diagram illustrating how the cache system assigns cache slots to threads

How the cache slot handles the read request is completely up to the cache slot implementation. My .zip archive reader implementation will first check if the read intersects with the data already in its buffer (a new slot has nothing buffered, of course) and provide it to the caller. If the remaining data is larger then the buffer size, it will extract directly into the caller-provided memory, otherwise it will extract until the buffer is at capacity and provide the region the caller was interested in:

Flow chart illustrating the logic employed by my ZLib extraction cache

This solves all of the use cases nicely:

If the game wants the whole file in one go, the new cache slot will start out as empty and, since more data is requested than the buffer can hold, extract everything directly into the caller-provided memory area.
If the game reads data in small bits, the requested bits will be smaller than the buffer size, causing the cache to extract data in chunks of one buffer size at a time and return data from there.
If the game reads just the file header over and over, it will be smaller than the buffer size, causing the cache to extract one buffer worth of data and keep returning that to the caller.
If the game skips a region of the file, data will be extracted into the buffer (since ZLib cannot extract-and-discard) over and over again until the required data enters the buffer. Then the normal logic is executed as before.

In addition to solving all the uses cases in the most efficient way possible (at this point, you couldn’t do anything better by hand-coding!), it also works very well with multi-threaded accesses.

Since the Cache class releases its mutex again after assigning a slot to a thread, if multiple threads happen to read from the same file, they can extract data in parallel instead of letting one thread wait for the other.

Further, one cache is responsible for the whole .zip archive, so with 8 slots and a buffer size of 4 KiB, it uses up to 32 KiB in total, not 32 KiB per file. If you stop accessing a file, it will eventually be evicted from the cache’s slots (on a least-recently-used basis), so we do not need a silly File.ImDoneWithYou() method to reclaim memory.

Conclusion

I’m very happy with how my file system layer turned out. It is very simple to use, completely thread-safe and seamlessly lets me access files in .zip archives in random access patterns, offering the same efficiency you could achieve if you hand-coded the zip extraction code tailored for every use case.

The interface also remains bloat-free (GetSize(), ReadAt(), WriteAt() – you can’t trim it any further than that!) and it’s a pleasure to write file access code now.

In the end, I set up a small Ubuntu Linux system in VirtualBox, using GCC 4.7 and Eclipse to implement the file manager using the Linux/Posix file API as well.

Source Code

You can check out the source code online via Trac here: Nuclex.Storage.Native sources

Or you can point your Subversion client to: https://devel.nuclex.org/framework/svn/storage/Nuclex.Storage.Native/trunk