progress report – Tearnote's Teapot

A month since the first one! 31 days, on the dot! One of these days I won’t start a progress report shocked about how long it’s been, but not today. It’s been a time of massive redesigns, stability / performance improvements, and creation of the most robust song importer the world of BMS has ever seen. Let’s dive into each change, from the foundational to the flashy.

No images to wetten up the dry text this time, sorry. I’ll make it up to you in the next one. Actually, here’s my cat, why not.

Thread rebuild

With the move from mutex-based channels to lock-free queues, every role of the audio thread became redundant, and so it got the axe. There is now only input and render. The input thread, aside from being the initial thread, mostly just spins collecting and sending out user input events. render, on the other hand, handles pretty much everything else – it sets up audio, video, file ops. A lot more threads than that are implicit though, and managed by their respective libraries; the logging library has a backend thread for asynchronous I/O, and audio is generated by running callbacks in the OS-managed realtime thread.

These audio callbacks are where the magic happens. They own a lock-free queue that collects user inputs from the input thread. All audio pushed to the sound device is generated ad-hoc by advancing the playing chart by one sample at a time (typically 1/44100th of a second), giving it any relevant inputs in the process, and receiving back requests to play audio files. The chart itself generates note hit events into its own queue which the render thread handles in its own time, judging the hits and accumulating score without the pressure of having to meet a CPU deadline much tighter than the length of a frame.

To put it more simply, a chart here is essentially a sampler VST. The new architecture skips some of the middleman steps that were previously present, and guarantees even lower latencies and jitter.

Coroutines

Some of the planned features involve code running in the background, plugging away at their job while you play the game. Loading a BMS chart takes longer than a frame, and it would be best if the game didn’t freeze up in the meantime. Some tasks are parallelizable, while others are long-running; some are even both. Playnote needed a centralized solution for this to make sure the OS doesn’t get overwhelmed with a mass of threads if many of these tasks happen to be running at once, which would result in slowdowns and stutters.

Initially, I played around with a hand-written thread pool that takes tasks from a queue, each thread pool worker running its task to completion before taking the next one. This approach had a major fault, which surfaced as soon as tasks started spawning and waiting on other tasks to complete. If there were more jobs waiting for another job’s completion than there were workers, it was possible for all workers to get stuck with them, and there would be no more further progress. A deadlock.

This would be resolved if, instead of waiting around for others, idle workers could pick up one of these jobs they depend on. (If you want something done, do it yourself!) This is exactly what coroutines implement. A coroutine is a function that is able to suspend itself, giving the executor a chance to switch to other work. This operation is called yielding. A special kind of yield is an await, which spawns another coroutine and then yields until it’s complete, returning the result value back to the awaiter; this is coroutines’ replacement for promise/future message passing, and feels just like calling a function, making for more readable code.

The coroutine runtime can be single-threaded, which is most often used to “fill in” the CPU time during long operations, like storage access or network requests. This is what allows JavaScript code on a website to work on multiple tasks concurrently, even though each tab is a single OS thread. A multi-threaded one, a pool of coroutine runners, is best for parallelization of complex work. Coroutines implement a model called cooperative multitasking, where tasks relinquish control voluntarily, as opposed to preemptive multitasking, where a scheduler can switch between threads at any point. Because suspend points are known in advance, the switching operation can be lighter and faster.

It sounds very elegant, but the support for coroutines, added in C++20, provides only the fundamental building blocks. A separate library needs to build on top of them, implementing the executor and other abstractions. I went with libcoro, which is a somewhat arbitrary choice, but it’s working well enough.

Playnote now uses coroutines for every operation that can be expected to take more than one frame. The task pool workers are set to low priority, ensuring they don’t eat into CPU time of the main threads. As a result, the game is silky smooth and responsive even during loadings or background operations.

Chart database

Frankly, providing the chart to play as a command-line argument doesn’t make for the best user experience. To do better than this, information about a chart needs to be saved somewhere, so that it can be recalled in the future. This is also an opportunity to cache the results of some of the more expensive calculations, like volume normalization or generating the note density graph, so that every subsequent loading is much faster.

A database is the perfect tool for the job, and Playnote‘s choice is, unsurprisingly, sqlite. It might be the single most widespread library in the world. It’s insanely fast, and so robust it doesn’t get corrupted even if you pull the power plug mid-transaction. One incredibly overengineered wrapper API later, and we’ve got the charts saving. To uniquely identify a chart, Playnote uses their MD5 checksum. It’s an old algorithm, but one that has been used by BMS internet rankings and difficulty tables since forever, and it’s good enough for this purpose.

Song format

Caching calculations helps a lot, but there’s more that can be done to speed up loadings. Chart metadata is now known in advance, but the audio files still need to be loaded from disk every time. This can be improved by optimizing the way the files are stored.

Videogames routinely pack their assets into proprietary containers, so that reading them is faster than filesystem lookups. Some obfuscate or even encrypt them to prevent reverse-engineering. I decided against this approach for Playnote, though. Many users of BMS already have massive song libraries, and keeping Playnote‘s optimized version around would double the space usage. I wouldn’t expect anyone to delete their original library, since the optimized version would be unusable with any other BMS client, should they choose to go back. Can we get the benefits of an asset pack without “monopolizing” the data?

After much thought, I decided on the following format, which I call the songzip:

A valid zip archive,
All filenames use UTF-8,
BMS files are stored at the top level (not in a subfolder),
All files use the STORE method (uncompressed),
“Wasteful” formats, like WAV or BMP, are re-encoded into OGG Vorbis and PNG, respectively.

Let’s go through these points in order.

A valid zip archive + all filenames use UTF-8

With these requirements met, a songzip is just an archive that can be easily opened and extracted on any computer running any OS. If a user wishes to switch away from Playnote, all they need to do is extract all the songs from their zips into folders, and the resulting library will work with older clients.

BMS files are stored at the top level

This speeds up finding of dependent files (audio, images, etc.) by eliminating any folder structures. On songzip load, an in-memory database of all files inside is created, with an index on the filename column to accelerate lookups. For media files, the extension is removed. If a BMS chart requires foo.wav, a SQL query for a file of type audio called foo finds it rapidly and efficiently.

All files use the STORE method

Decompression is fast, but the decompressed data still needs to be stored in memory. If the files are uncompressed, this can be avoided entirely by memory-mapping the archive. The zip format stores individual files contiguously, so when the files are uncompressed, the complete contents of each file are a single memory section that can be stored as a pointer+size pair into the memory-mapped zone. There are no copies or allocations involved, from start to end. It’s very fast.

“Wasteful” formats are re-encoded

Many BMS songs, especially older ones, use uncompressed media. At the time the compressed formats were still in their infancy, and the CPU requirements for decoding them were not negligible. Later, a WAV version was offered as the “high quality” one while the OGG Vorbis version using very low encoding settings was offered to save space at the cost of audio quality. CPUs are now much faster, and the encoders themselves have improved as well; massive space savings are now possible with no perceivable loss in sound quality.

If WAV audio is found, Playnote re-encodes it on-the-fly to an OGG Vorbis file at the q5 quality level, which is the level at which most people agree Vorbis achieves transparency. Better formats are around, most notably Opus, but they don’t work with older clients, defeating the goal of reverse-compatibility. I suppose this feature also makes Playnote a useful mass-optimizer of BMS libraries.

Song importer

But when and how are these songzips created? With this smooth segway, we’re at the meat of the update. First, though, a note on how various rhythm games handle this.

When you want to add a new song to your BMS client, the process is as follows:

Find the song somewhere, typically either the creator’s website or some sort of song pack,
Unzip the song into your BMS client’s “songs” folder (hoping your archive program doesn’t mangle the CJK filenames),
Start the BMS client (if it’s already open, you have to close it first),
Push the button to refresh the song database,
Go have a cup of your hot beverage of choice,
Once it looks like the window finally unfroze, check if the song was actually added.

On the other hand, this is the process of adding a new song in osu!:

Find the song on the game’s website or the in-game downloader,
Click it or drag the file onto the game window,
Wait up to a few seconds for the song to be imported and available.

It’s no surprise which one of the two has a larger player base.

There is no technical reason why BMS can’t be as friendly to use. The format’s various historical warts provide some extra engineering challenges, but solving them should be the job of the programmer rather an extra burden on the user.

The process of adding a new song in Playnote is currently this:

Find the song somewhere, typically either the creator’s website or some sort of song pack,
Drag the archive (or the folder where you unpacked it) onto the game window,
Wait until it’s finished, or play the game in the meantime.

Here’s what it looks like in practice:

It has been implied throughout this post, but Playnote does its best to maintain the grouping of charts into songs. In a vacuum there is nothing connecting all the charts of the same song; any metadata field can, and often is, different between each of them. They are only connected by the fact that they are in the same folder, and require most of the same files to play. By reflecting this relationship in the database, it will be possible to implement a better search, better navigation, and features such as 2-player battle with each player on a different difficulty of the same song.

But I digress. The chart importer takes as its input a list of files dropped into the game window, and its ultimate purpose is to create songzips from any BMS songs detected within the list and then import any charts inside into the database; ideally with as much parallelization as possible. I’ll go in more detail through all the steps it takes in order.

Locating sources

The user might want to import a single song, a song pack, or their entire BMS library. Any of the songs might be an archive, or a directory. To handle all of these possibilities, these scenarios are expected:

The import location is an archive,
The import location is a song directory,
The import location is a folder structure containing song archives or directories.

The last point is handled by recursive traversal; if the dropped path is a directory, and it doesn’t directly contain any files with a known BMS extension, an import task is spawned for every directory inside it.

Preparing the source

Once we have either an archive or a folder directly containing BMS files, this path is used to construct a source abstraction. From this point on, both directories and archives are iterated with the same syntax, so the rest of the importer doesn’t have to care about the distinction.

In the archive case, a little more work needs to be done behind the scenes. To avoid issues with filename encodings, the list of all filenames in the archive is fed through a heuristic encoding detector (the same one used for the contents of BMS files themselves). Archives can also have subdirectories, so the prefix is detected – the shortest path that needs to be taken to reach a BMS file.

Creating the songzip

With the source abstraction in place, the songzip can be created. The archive is created in the game’s “library” folder, with the same filename as the original folder / archive (with a number appended if that’s already taken). As the files are processed, format conversion is performed as needed. Once finished, it’s time to create the in-memory database that serves as a file registry for fast lookup that follows the BMS rules of case-insensitivity and ignoring the extension.

Complication: What about duplicates?

Automatically maintaining a connection between charts of the same song comes with many benefits, but adds complexity as well. What if the player is importing a song that’s already in the library? The MD5s will be duplicates, but we don’t know that we shouldn’t create a new songzip yet. To resolve this, the charts are checksummed and checked against the database directly from the source. If even one of these checksums already exist in the database, the entire song is assumed to be a duplicate. The songzip is still created, but by extending the existing file with whatever contents it’s missing, in case the new import is a superset of the existing one. This is how a user is able to add extra difficulties, known as “sabuns”.

Spawning chart imports

The songzip is created and ready for use, so now we can enumerate charts and import each one. As part of the import process they will need to render out their audio, and for that they will need the song’s audio files. Most of the charts will need the exact same files, so to avoid redundant work, the songzip allows for an optional preload step. The audio files are loaded and decoded from their original audio format to raw samples, and stored in a cache. Later, if a chart loads a file but it already exists in the cache, the cached version is served.

Chart imports

This is the part where metrics, statistics and other expensive one-time operations are precalculated so that they can be cached in the database. All logs from this part of the import process are captured and saved in the database as well. In the future these import logs will be available for reference, in case the chart doesn’t seem to be playing correctly and the user wants to know if it’s because Playnote didn’t like something in the BMS file.

The preview audio is generated in this phase as well, a 15-second snipped from 20 seconds into the song. This will be encoded as a 64 kbps Opus file, and saved into the database. Previews are generated per-chart rather than per-song, because different charts of the same song do sometimes have slightly (or majorly!) different audio.

Finalization

Once all chart imports are complete, we’re just about done. If none of the charts were actually successful, there’s no point in keeping the songzip around, so it is deleted from both the database and the “library” folder. Otherwise, an additional preview deduplication step is done, saving space by pointing all charts whose previews are nearly identical at the same preview file via a many-to-one relationship in the database.

Parallelism

We’re still not done! While each song does follow the process outlined above, there’s more things to mind once we allow multiple songs and charts to be importing at the same time. While sqlite is thread-safe by default on the statement level, the individual statements from different threads might interleave, which might prove catastrophic if any of them are currently performing atomic transactions. A mutex must be used to make sure that only one transaction is being prepared at the same time.

Error handling

The import can fail at any point, both during song processing and chart processing. Any failure is a thrown exception that is cleanly caught and reported in the logs, without affecting other charts of the song or the rest of the import. Thanks to duplicate handling, you can retry a failed import anytime in the future by dragging the song over again.

Complication complication: No ethical duplication under paralellism

There’s still one big edge case to handle. Duplicates are handled correctly if the song existed before this import job, but what if two of the same song are being imported at the same time? If the timing lines up just right, the two imports of the same song might run in parallel. They will both miss each other’s existence and happily continue, thinking they are the first one.

Resolving this requires that the tasks are aware of what the other tasks are planning to do, but haven’t done yet. To achieve this, a mutex-protected staging area must be consistently used, augmenting the entire import process outlined earlier.

When enumerating the BMS files in the source, the MD5s are checked against the staging area. If any of them exist there, the import knows that it’s a duplicate of a still-ongoing import. It assumes the same song ID as the other import and waits for the ID to become available via a per-ID mutex. If the MD5s aren’t present, the import assumes it’s the first to handle this song, registers its charts with the staging area, and grabs the song ID mutex so that any duplicates that come after are able to wait for its completion.

Lead-out

This milestone alone felt like a whole journey of its own, full of surprise discoveries, unexpected challenges, and unusual destinations – many of the solutions I arrived at were quite different from what I thought they would be. But this is now done and we’re at 0.0.4, with the importer in particular being pretty much ready for the big one-point-zero release someday in the future.

If you’re interested in giving the game a try in its current WIP state, head to the repository to build it, or grab the Windows CI build. Here is how you can find some BMS songs to play, for the time being!

The next milestone, 0.0.5, will replace the current temporary renderer with the proper one that will be used for all game graphics. Stay tuned.

All posts tagged progress report

Playnote Progress Report #2