Playnote Progress Report #2

Playnote Progress Report #2

A month since the first one! 31 days, on the dot! One of these days I won’t start a progress report shocked about how long it’s been, but not today. It’s been a time of massive redesigns, stability / performance improvements, and creation of the most robust song importer the world of BMS has ever seen. Let’s dive into each change, from the foundational to the flashy.

No images to wetten up the dry text this time, sorry. I’ll make it up to you in the next one. Actually, here’s my cat, why not.

Mood.

Thread rebuild

With the move from mutex-based channels to lock-free queues, every role of the audio thread became redundant, and so it got the axe. There is now only input and render. The input thread, aside from being the initial thread, mostly just spins collecting and sending out user input events. render, on the other hand, handles pretty much everything else – it sets up audio, video, file ops. A lot more threads than that are implicit though, and managed by their respective libraries; the logging library has a backend thread for asynchronous I/O, and audio is generated by running callbacks in the OS-managed realtime thread.

These audio callbacks are where the magic happens. They own a lock-free queue that collects user inputs from the input thread. All audio pushed to the sound device is generated ad-hoc by advancing the playing chart by one sample at a time (typically 1/44100th of a second), giving it any relevant inputs in the process, and receiving back requests to play audio files. The chart itself generates note hit events into its own queue which the render thread handles in its own time, judging the hits and accumulating score without the pressure of having to meet a CPU deadline much tighter than the length of a frame.

To put it more simply, a chart here is essentially a sampler VST. The new architecture skips some of the middleman steps that were previously present, and guarantees even lower latencies and jitter.

Coroutines

Some of the planned features involve code running in the background, plugging away at their job while you play the game. Loading a BMS chart takes longer than a frame, and it would be best if the game didn’t freeze up in the meantime. Some tasks are parallelizable, while others are long-running; some are even both. Playnote needed a centralized solution for this to make sure the OS doesn’t get overwhelmed with a mass of threads if many of these tasks happen to be running at once, which would result in slowdowns and stutters.

Initially, I played around with a hand-written thread pool that takes tasks from a queue, each thread pool worker running its task to completion before taking the next one. This approach had a major fault, which surfaced as soon as tasks started spawning and waiting on other tasks to complete. If there were more jobs waiting for another job’s completion than there were workers, it was possible for all workers to get stuck with them, and there would be no more further progress. A deadlock.

This would be resolved if, instead of waiting around for others, idle workers could pick up one of these jobs they depend on. (If you want something done, do it yourself!) This is exactly what coroutines implement. A coroutine is a function that is able to suspend itself, giving the executor a chance to switch to other work. This operation is called yielding. A special kind of yield is an await, which spawns another coroutine and then yields until it’s complete, returning the result value back to the awaiter; this is coroutines’ replacement for promise/future message passing, and feels just like calling a function, making for more readable code.

The coroutine runtime can be single-threaded, which is most often used to “fill in” the CPU time during long operations, like storage access or network requests. This is what allows JavaScript code on a website to work on multiple tasks concurrently, even though each tab is a single OS thread. A multi-threaded one, a pool of coroutine runners, is best for parallelization of complex work. Coroutines implement a model called cooperative multitasking, where tasks relinquish control voluntarily, as opposed to preemptive multitasking, where a scheduler can switch between threads at any point. Because suspend points are known in advance, the switching operation can be lighter and faster.

It sounds very elegant, but the support for coroutines, added in C++20, provides only the fundamental building blocks. A separate library needs to build on top of them, implementing the executor and other abstractions. I went with libcoro, which is a somewhat arbitrary choice, but it’s working well enough.

Playnote now uses coroutines for every operation that can be expected to take more than one frame. The task pool workers are set to low priority, ensuring they don’t eat into CPU time of the main threads. As a result, the game is silky smooth and responsive even during loadings or background operations.

Chart database

Frankly, providing the chart to play as a command-line argument doesn’t make for the best user experience. To do better than this, information about a chart needs to be saved somewhere, so that it can be recalled in the future. This is also an opportunity to cache the results of some of the more expensive calculations, like volume normalization or generating the note density graph, so that every subsequent loading is much faster.

A database is the perfect tool for the job, and Playnote‘s choice is, unsurprisingly, sqlite. It might be the single most widespread library in the world. It’s insanely fast, and so robust it doesn’t get corrupted even if you pull the power plug mid-transaction. One incredibly overengineered wrapper API later, and we’ve got the charts saving. To uniquely identify a chart, Playnote uses their MD5 checksum. It’s an old algorithm, but one that has been used by BMS internet rankings and difficulty tables since forever, and it’s good enough for this purpose.

Song format

Caching calculations helps a lot, but there’s more that can be done to speed up loadings. Chart metadata is now known in advance, but the audio files still need to be loaded from disk every time. This can be improved by optimizing the way the files are stored.

Videogames routinely pack their assets into proprietary containers, so that reading them is faster than filesystem lookups. Some obfuscate or even encrypt them to prevent reverse-engineering. I decided against this approach for Playnote, though. Many users of BMS already have massive song libraries, and keeping Playnote‘s optimized version around would double the space usage. I wouldn’t expect anyone to delete their original library, since the optimized version would be unusable with any other BMS client, should they choose to go back. Can we get the benefits of an asset pack without “monopolizing” the data?

After much thought, I decided on the following format, which I call the songzip:

  1. A valid zip archive,
  2. All filenames use UTF-8,
  3. BMS files are stored at the top level (not in a subfolder),
  4. All files use the STORE method (uncompressed),
  5. “Wasteful” formats, like WAV or BMP, are re-encoded into OGG Vorbis and PNG, respectively.

Let’s go through these points in order.

A valid zip archive + all filenames use UTF-8

With these requirements met, a songzip is just an archive that can be easily opened and extracted on any computer running any OS. If a user wishes to switch away from Playnote, all they need to do is extract all the songs from their zips into folders, and the resulting library will work with older clients.

BMS files are stored at the top level

This speeds up finding of dependent files (audio, images, etc.) by eliminating any folder structures. On songzip load, an in-memory database of all files inside is created, with an index on the filename column to accelerate lookups. For media files, the extension is removed. If a BMS chart requires foo.wav, a SQL query for a file of type audio called foo finds it rapidly and efficiently.

All files use the STORE method

Decompression is fast, but the decompressed data still needs to be stored in memory. If the files are uncompressed, this can be avoided entirely by memory-mapping the archive. The zip format stores individual files contiguously, so when the files are uncompressed, the complete contents of each file are a single memory section that can be stored as a pointer+size pair into the memory-mapped zone. There are no copies or allocations involved, from start to end. It’s very fast.

“Wasteful” formats are re-encoded

Many BMS songs, especially older ones, use uncompressed media. At the time the compressed formats were still in their infancy, and the CPU requirements for decoding them were not negligible. Later, a WAV version was offered as the “high quality” one while the OGG Vorbis version using very low encoding settings was offered to save space at the cost of audio quality. CPUs are now much faster, and the encoders themselves have improved as well; massive space savings are now possible with no perceivable loss in sound quality.

If WAV audio is found, Playnote re-encodes it on-the-fly to an OGG Vorbis file at the q5 quality level, which is the level at which most people agree Vorbis achieves transparency. Better formats are around, most notably Opus, but they don’t work with older clients, defeating the goal of reverse-compatibility. I suppose this feature also makes Playnote a useful mass-optimizer of BMS libraries.

Song importer

But when and how are these songzips created? With this smooth segway, we’re at the meat of the update. First, though, a note on how various rhythm games handle this.

When you want to add a new song to your BMS client, the process is as follows:

  1. Find the song somewhere, typically either the creator’s website or some sort of song pack,
  2. Unzip the song into your BMS client’s “songs” folder (hoping your archive program doesn’t mangle the CJK filenames),
  3. Start the BMS client (if it’s already open, you have to close it first),
  4. Push the button to refresh the song database,
  5. Go have a cup of your hot beverage of choice,
  6. Once it looks like the window finally unfroze, check if the song was actually added.

On the other hand, this is the process of adding a new song in osu!:

  1. Find the song on the game’s website or the in-game downloader,
  2. Click it or drag the file onto the game window,
  3. Wait up to a few seconds for the song to be imported and available.

It’s no surprise which one of the two has a larger player base.

There is no technical reason why BMS can’t be as friendly to use. The format’s various historical warts provide some extra engineering challenges, but solving them should be the job of the programmer rather an extra burden on the user.

The process of adding a new song in Playnote is currently this:

  1. Find the song somewhere, typically either the creator’s website or some sort of song pack,
  2. Drag the archive (or the folder where you unpacked it) onto the game window,
  3. Wait until it’s finished, or play the game in the meantime.

Here’s what it looks like in practice:

It has been implied throughout this post, but Playnote does its best to maintain the grouping of charts into songs. In a vacuum there is nothing connecting all the charts of the same song; any metadata field can, and often is, different between each of them. They are only connected by the fact that they are in the same folder, and require most of the same files to play. By reflecting this relationship in the database, it will be possible to implement a better search, better navigation, and features such as 2-player battle with each player on a different difficulty of the same song.

But I digress. The chart importer takes as its input a list of files dropped into the game window, and its ultimate purpose is to create songzips from any BMS songs detected within the list and then import any charts inside into the database; ideally with as much parallelization as possible. I’ll go in more detail through all the steps it takes in order.

Locating sources

The user might want to import a single song, a song pack, or their entire BMS library. Any of the songs might be an archive, or a directory. To handle all of these possibilities, these scenarios are expected:

  1. The import location is an archive,
  2. The import location is a song directory,
  3. The import location is a folder structure containing song archives or directories.

The last point is handled by recursive traversal; if the dropped path is a directory, and it doesn’t directly contain any files with a known BMS extension, an import task is spawned for every directory inside it.

Preparing the source

Once we have either an archive or a folder directly containing BMS files, this path is used to construct a source abstraction. From this point on, both directories and archives are iterated with the same syntax, so the rest of the importer doesn’t have to care about the distinction.

In the archive case, a little more work needs to be done behind the scenes. To avoid issues with filename encodings, the list of all filenames in the archive is fed through a heuristic encoding detector (the same one used for the contents of BMS files themselves). Archives can also have subdirectories, so the prefix is detected – the shortest path that needs to be taken to reach a BMS file.

Creating the songzip

With the source abstraction in place, the songzip can be created. The archive is created in the game’s “library” folder, with the same filename as the original folder / archive (with a number appended if that’s already taken). As the files are processed, format conversion is performed as needed. Once finished, it’s time to create the in-memory database that serves as a file registry for fast lookup that follows the BMS rules of case-insensitivity and ignoring the extension.

Complication: What about duplicates?

Automatically maintaining a connection between charts of the same song comes with many benefits, but adds complexity as well. What if the player is importing a song that’s already in the library? The MD5s will be duplicates, but we don’t know that we shouldn’t create a new songzip yet. To resolve this, the charts are checksummed and checked against the database directly from the source. If even one of these checksums already exist in the database, the entire song is assumed to be a duplicate. The songzip is still created, but by extending the existing file with whatever contents it’s missing, in case the new import is a superset of the existing one. This is how a user is able to add extra difficulties, known as “sabuns”.

Spawning chart imports

The songzip is created and ready for use, so now we can enumerate charts and import each one. As part of the import process they will need to render out their audio, and for that they will need the song’s audio files. Most of the charts will need the exact same files, so to avoid redundant work, the songzip allows for an optional preload step. The audio files are loaded and decoded from their original audio format to raw samples, and stored in a cache. Later, if a chart loads a file but it already exists in the cache, the cached version is served.

Chart imports

This is the part where metrics, statistics and other expensive one-time operations are precalculated so that they can be cached in the database. All logs from this part of the import process are captured and saved in the database as well. In the future these import logs will be available for reference, in case the chart doesn’t seem to be playing correctly and the user wants to know if it’s because Playnote didn’t like something in the BMS file.

The preview audio is generated in this phase as well, a 15-second snipped from 20 seconds into the song. This will be encoded as a 64 kbps Opus file, and saved into the database. Previews are generated per-chart rather than per-song, because different charts of the same song do sometimes have slightly (or majorly!) different audio.

Finalization

Once all chart imports are complete, we’re just about done. If none of the charts were actually successful, there’s no point in keeping the songzip around, so it is deleted from both the database and the “library” folder. Otherwise, an additional preview deduplication step is done, saving space by pointing all charts whose previews are nearly identical at the same preview file via a many-to-one relationship in the database.

Parallelism

We’re still not done! While each song does follow the process outlined above, there’s more things to mind once we allow multiple songs and charts to be importing at the same time. While sqlite is thread-safe by default on the statement level, the individual statements from different threads might interleave, which might prove catastrophic if any of them are currently performing atomic transactions. A mutex must be used to make sure that only one transaction is being prepared at the same time.

Error handling

The import can fail at any point, both during song processing and chart processing. Any failure is a thrown exception that is cleanly caught and reported in the logs, without affecting other charts of the song or the rest of the import. Thanks to duplicate handling, you can retry a failed import anytime in the future by dragging the song over again.

Complication complication: No ethical duplication under paralellism

There’s still one big edge case to handle. Duplicates are handled correctly if the song existed before this import job, but what if two of the same song are being imported at the same time? If the timing lines up just right, the two imports of the same song might run in parallel. They will both miss each other’s existence and happily continue, thinking they are the first one.

Resolving this requires that the tasks are aware of what the other tasks are planning to do, but haven’t done yet. To achieve this, a mutex-protected staging area must be consistently used, augmenting the entire import process outlined earlier.

When enumerating the BMS files in the source, the MD5s are checked against the staging area. If any of them exist there, the import knows that it’s a duplicate of a still-ongoing import. It assumes the same song ID as the other import and waits for the ID to become available via a per-ID mutex. If the MD5s aren’t present, the import assumes it’s the first to handle this song, registers its charts with the staging area, and grabs the song ID mutex so that any duplicates that come after are able to wait for its completion.

Lead-out

This milestone alone felt like a whole journey of its own, full of surprise discoveries, unexpected challenges, and unusual destinations – many of the solutions I arrived at were quite different from how I thought they would be. But this is now done and we’re at 0.0.4, with the importer in particular being pretty much ready for the big one-point-zero release someday in the future.

If you’re interested in giving the game a try in its current WIP state, head to the repository to build it, or grab the Windows CI build. Here is how you can find some BMS songs to play, for the time being!

The next milestone, 0.1, will replace the current temporary renderer with the proper one that will be used for all game graphics. Stay tuned.

Playnote: The Big Idea

Playnote: The Big Idea

A wise man once said,

making your vision clear ahead of time not only helps you stay on track but attracts potential users interested in what you’re going for, and early feedback is extremely important

Alright, it wasn’t a wise man, it was me, on Discord, last week. That’s all the more reason though to write up a clear statement of scope – what Playnote is and isn’t going to be. No more witty intros, let’s go.

Reliable and performant by default

Starting on the guts end of things – a big area where Playnote can improve over the competition is technical soundness. No crashes, no memory leaks, no stutters, no lock-ups regardless of how heavy an operation, clear error reporting when something goes wrong. Performance should be smooth on any reasonably modern device, the latencies (both audio and video) as low as the platform allows, and independent from rendering framerate. There’s no valid reason for a rhythm game to have to run at 480 FPS to feel responsive on a 120Hz display – that’s a waste of 75% of the frames. To this end, the code is written to be exception-safe from day one, and all failures tested to be handled as expected, including critical ones always triggering a clean shutdown.

Defaults have power. All this should require no upfront configuration; settings are good, but they should default to the most performant options, only to be changed to improve compatibility when the user encounters an issue.

As low-level as needed, but no lower

Paraphrasing a quote that Einstein may or may not have said, Playnote isn’t a slave to any particular level of abstraction, sometimes to the horror of fans of any particular one. A project that aims for technical excellence in a niche area is going to need to reinvent some wheels, but the majority of components can be off-the-shelf. The biggest challenge of any hobby project is finishing before you die, and any choice needs to have this in mind.

The unspoken value of libraries and abstractions is a reduction in the number of lines of my own code. Code I wrote has been used once (by me), while code others wrote has been used by thousands of people; so, the more of my own game has been written by someone else, the better it’s going to be. If in doubt, get data – seek benchmarks, or just go for it and then profile the outcome.

Fix the design of BMS invisibly

BMS is not a good format. It’s dragged down by decades of bad decisions and non-standard extensions, and this is much of the reason why BMS clients can be hard to work with. Playnote should go out of its way to hide as much of that as possible and pretend BMS is reliable and amazing, with whatever algorithms and heuristics it takes to achieve that. Text encodings are detected, typo’d header commands are fixed, oversized WAV samples are compressed. It should be up to the programmer to deal with these challenges, not up to the user.

Best-effort compatibility

There’s a lot of BMS files out there, and the older they are the more broken they tend to be, in ways I surely can’t even begin to imagine right now. hitkey‘s write-up lists some of the horrors he encountered. Playnote will strive to support all valid BMS files, at least initially. Some leniency is already baked into the parser, and any further fixes for compatibility with invalid BMS files will be done on a case-by-case basis. I would love to aim for 100% compatibility, but that can only be a goal once all more important things are done first. Please understand.

One aspect where compatibility must be maintained is with older BMS clients. When importing a song and converting it into a format that’s as quick to load as possible, it must use only formats that still work with LR2 and beatoraja. This is so that it’s possible to revert from Playnote to an older client, should one be unhappy with Playnote and decide to go back. With the currently planned design, a library is a collection of optimized zip files, which can be simply unpacked into old-school song folders.

Effortless user experience

As much I disagree with its gameplay design choices, osu! is a masterclass in UX. A single-exe installer that unpacks into %APPDATA%, built-in song downloader, reasonably preconfigured for getting started right away; to get more songs, all you need to do is drag a downloaded song file onto the game window to import it. There are no technical reasons why a BMS client couldn’t be just as easy to use. Features that optimize the newcomer experience are first-class. BMS power users must of course be satisfied as well, but hiding away features that are useful less often into “Advanced” submenus will improve the UI flows for everyone.

Unique visual style

The game will stick with 2D graphics of course, but that’s no reason why it shouldn’t try to stand out – an attractive screenshot will always sell a game better than a list of features. I’m no traditional artist, but with strong past graphics programming experience, I can make something interesting from a tech art perspective. The current plan is for all graphics to be built out of SDFs, including text; this will allow for highly dynamic forms, with perfect analytical antialiasing at any scale. This concept has already been successfully attempted and used in production for editor tooling at Guerilla Games.

The visual style will be clean, minimal and stylish; taking inspiration from N++, NieR:Automata, and the work of The Designers Republic. I like to believe I have some sense of aesthetics, so it might take a lot of trial and error, but I should at least realize when I arrive at something visually pleasing. I’m dreading the process, but looking forward to the results.

Break bad traditions

This one might be controversial, but I believe that some parts of the BMS standard should be actively ignored and redesigned to better match how the modern audience plays rhythm games. For example, the #RANK header allows a chart to decide between (typically) four sizes of judgment windows. This might’ve sounded like a good idea at the time, but rhythm games these days don’t tend to vary their timing between songs or between difficulties. Most recent BMS charts go for the easiest rank, which over time gave BMS the reputation of being loose and good for training physical speed. #RANK will be ignored, but to respect BMS’s place, the windows will be slightly looser than in IIDX.

This is also the reason why Playnote will not allow importing scores from a previous BMS client, or connecting to an existing internet ranking service. LR2 and beatoraja already have different timing windows for the same rank (a fact that spawned the LR2oraja fork), and even identical values can feel different on different engines and with different latencies. There’s simply no way to make scores directly comparable in a way that would satisfy everyone. Even re-simulating a play with the same inputs would be questionable, since it doesn’t take into account the player’s mental state.

Another example is the #PLAYLEVEL header, which specifies a difficulty number to be shown in song select. These are infamous for not being very useful, since the value is entirely at the chart creator’s discretion, and there’s no way to specify the scale (normal/insane/etc.) Third-party difficulty tables have been created to fix that by independently ranking charts’ difficulty, often via a community voting process. Playnote will ignore #PLAYLEVEL entirely, displaying only its calculated average notes-per-second unless a table is available for the chart.

The little things are good!

BMS clients are full of minor features people love, and the lack of any one of them can be a dealbreaker for someone out there. 2-Player battle, lamps above Full Combo, practice mode, gauge auto-shift, etc. If it makes sense at all, it will be implemented.

Customization over skinning

Due to the planned nature of the renderer, skins would be very difficult to implement, and so are not part of the plans for the 1.0 release. Customization is part of the very soul of BMS though, so many pre-made layouts and components will be available to choose from, with arbitrary image display available as well for shareable areas like the result screen.

Afterword

That was a lot! Still, I think it’s important to have the general roadmap out there, even if just to keep myself accountable to it in the future. Please keep in mind that this is the vision for 1.0, and doesn’t mean the end of development. Other major features, like an internet ranking, online profile or a chart recommendation system, are not out of question, just out of scope for the initial release.

If you feel particularly strongly about any of these items, please let me know via a GitHub issue.

Playnote Progress Report #1

Playnote Progress Report #1

Whew, has it really been half a year? Back then I didn’t expect I would still be working on it after all this time, let alone see it become a Serious Project. But, I’m getting ahead of myself. This is the first Progress Report, so proper introductions are in order.

The story so far

Skip this section if you’re familiar with BMS. If you’re not – it’s a rhythm game that looks roughly like this:

DP BMS gameplay in beatoraja, by yours truly

Rectangles are falling, hit the lane’s assigned key to play the missing sounds in the music and complete the song. The gameplay is the same as beatmania IIDX, a commercial arcade rhythm game by Konami’s Bemani division; BMS is the fan-made version, with all music and charts made by members of the community (yes, all the music is original!) The video shows the 14-key mode (Double Play), my personal preference, but 7-key and others are available as well and are arguably far more popular.

Many clients were made that all play charts in the BMS format, but probably the most legendary of them all was Lunatic Rave 2 by lavalse. It was built largely to resemble and replicate the functionality of the PS2 releases of IIDX, and enjoyed great popularity due to its level of polish and performance. During its time, the BMS community grew by orders of magnitude. Sadly, the game’s development was cut short in 2010, allegedly due the developer losing access to the source code. By 2015, the official website went offline. Projects exist to add features and fix bugs in the most recent version, but since all they have is a compiled executable, it takes painstaking decompilation effort and making larger changes is difficult.

A screenshot of Lunatic Rave 2 song select screen
LR2 song select, in all its 640×480 glory

LR2 was followed up by beatoraja, a next-gen BMS client by exch. Written in Java and hosted on GitHub, it expands on LR2’s feature set, adding higher resolutions, more QoL and better stability. While still clunky in some aspects, it’s an incredibly impressive effort, and the best we have right now. The language barrier proved to make it difficult to contribute improvements to the original repository, so western development has continued in forks like Endless Dream.

Motivation

One fine afternoon, I was setting up beatoraja on a new Linux install. Right away, it took half an hour just to get it to start, since I didn’t have Java installed, let alone the required JavaFX library runtime. Importing my massive BMS library took hours, and then… none of the songs were actually imported. The game didn’t tell me why. All I could do was split the library into multiple groups, then try importing each one. Most of them worked, a few didn’t. A few more bisects later, and I identified the three songs that broke the import process if they were present. It was already evening, and I didn’t get to play that day. The next day, I discovered that the game has more latency on Linux, due to the bindings of PortAudio it was using. It also dropped inputs when both Shifts were pressed on the keyboard at the same time, which is apparently an inherent limitation of LWJGL.

I walked away from this experience feeling that we deserve better than this. LR2 is great but dead and buried, and beatoraja is opensource but held back by the choice of programming language and its own architecture. If I wanted my perfect BMS client, I was going to have to do it myself.

A Windows folder properties window, showing a large BMS song library
6.8 million tiny files, 27 gigabytes’ worth of sector padding

I spent the next few months toying with the idea. Why do we need to store each song as folders of 2000 files that waste disk space and take ages to copy, even on an SSD? Why does updating the library mean I don’t get to play until it’s finished? Why does the game only feel responsive at ~500fps, even though my screen is 144 Hz? Why can’t I play against a friend locally with each of us picking a different difficulty of the same song? None of these problems are inherent limitations of the BMS format, they just need someone to go out there and solve them. I’ve already been walking around with half-designed solutions in my head, and I was confident I’d be able to do it if I put my mind to it.

Finally, the time was right. I lost my job, and needed something to pass the time while waiting for a new one to come along. I pushed the initial commit, and Playnote was born. Soon after, my favorite C++ IDE became free for non-commercial use, further bolstering my motivation. I decided on a goal of making at least one commit per day, no matter how small, which was a great idea in retrospect – it’s really satisfying to watch that daily streak number go up.

Progress

Fast-forward to the present – what do we have so far? Well, the game looks like this:

Screenshot of Playnote in its current state
Don’t judge, the graphics are temporary

It’s being developed in C++23 for Windows and Linux, with a custom audio engine and Vulkan for graphics. Currently, it can load one BMS chart at a time via the command line, and allows you to play it yourself (with a keyboard or a controller) or watch a perfect autoplay. Most charts work, though there are edge cases, and a few uncommon BMS features, like mines and control flow, aren’t yet supported.

The interesting parts, and the most difficult ones, are under the hood. The total input-to-keysound latency is 6ms on Linux and 8ms on Windows (2 or 3ms of audio buffer, +1ms limiter look-ahead, times two for round-trip latency.) The game is heavily threaded, inputs always polled at the highest possible rate, and engine ticks completely independent from the renderer. And it loads songs directly from archives!

There are a few smaller things it already does that no other BMS client has done so far:

  • All songs play at the same volume by analyzing them with a ReplayGain-like algorithm.
  • A heuristic detects the BMS file’s text encoding, fixing mojibake in both Japanese (Shift-JIS) and Korean (EUC-KR) charts with no need to mess with system locale.
  • Note density graph is a sum of Gaussians, and the average notes-per-second value uses a root-mean-square based algorithm to be a more useful measure of overall chart density than an arithmetic mean of time-bins.
Screenshot of Playnote note density graph
Blue line shows average NPS, purple shows peak NPS

The future

The next milestone, version 0.0.4, focuses on the chart database and song import process. In other words, it’ll be the first version that’s usable without messing with the command line.

Stay tuned for the next Progress Report! Between them, I might also post some deep-dives into the design process, and stories from development.

If you are interested in giving Playnote a try, you can find the repository here, and the automated Windows build is available in the latest GitHub Actions artifact.

C++23: deducing “this” into Rust traits

C++23: deducing “this” into Rust traits

C++23 is here, and compiler developers are slowly metabolizing the standardese into something we can use. One of the main highlights of this release is a language feature called “explicit object parameters,” which not only cleans up the syntax of the typical uses of the this keyword, but allows for brand new constructs that help tie up loose ends introduced by earlier language features, deduplicate code, and help untangle template spaghetti. A creative use of the “explicit object parameter” can even implement Rust-style traits with no boilerplate! Most of the sections below will build up to why explicit object parameters (just “EOP” going forward) are useful, in case you have only a passing familiarity with C++ – experts are advised to skip ahead.

Stating the warts

C++ is not an elegant language.

class Ghost {
    int peekaboo = 5;
public:
    void setToEight(int peekaboo) {
        peekaboo = 8; // (1)
    }

    void setToEightAgain() {
        peekaboo = 8;  // (2)
    }
};

Lines (1) and (2) are identical, but (1) changes the value of the function parameter, and (2) changes the value of the class member. This is because when an identifier doesn’t refer to anything in its current scope, the compiler checks if it matches a class member. This makes code ambiguous and dependent on context. To disambiguate, you can prepend this->:

this->peekaboo = 8;

Or add a prefix to class members:

class Ghost {
    int m_peekaboo = 5;
public:
    void setToEight(int peekaboo) {
        m_peekaboo = 8;
    }

Prefixes are aesthetically unpleasant. Using the this keyword works, but it’s still entirely optional, many people don’t like including it when it’s not required, and we didn’t eliminate the rule of implicit this-> and thus are still susceptible to bugs caused by accidentally referring to the wrong variable.

Public survey

How do other languages handle this? Let’s go back in time, and have a look at how OOP can be emulated in C:

struct Ghost {
    int peekaboo;
};
void Ghost__setToEight(Ghost* this) {
    this->peekaboo = 8;
}

Self-explanatory, the name lookup is trivial. If we wanted to emulate function members as well, we could keep a pointer to the setToEight() function inside the struct. Let’s see about Python:

class Ghost:
  peekaboo = 5
  def setToEight(self):
    self.peekaboo = 8

Here, if a member function wants to modify the instance, it needs that self parameter. And in Rust?

struct Ghost {
    peekaboo: i32,
}
impl Ghost {
    fn setToEight(&mut self) {
        self.peekaboo = 8;
    }
}

More or less the same as Python here, we just need mut because in Rust everything is const by default unless specified otherwise.

The majority of programming languages settled on making the this/self parameter explicit. It’s a little more typing, but name lookup rules become simpler and less magical, making code more readable at a glance.

You cannot escape this

C++ tries to hide the existence of this parameter, but with some more advanced features you need to be aware of its existence and reason about it anyway. One example is member function ref-qualifiers:

class Ghostbuster {
public:
    void bust() &;
    void bust() &&;
    void bust() const &;
    void bust() const &&;
};

This is a famously confusing syntax. How can a function itself be const, or a reference? Why are multiple functions with the same name and argument list allowed?

The answer is that these are the qualifiers of the implicit this. If the instance itself is const or an r-value (for example, it’s a temporary that was just returned from another function by value), the name resolution chooses the function with the best matching qualifier. If you assume that this is a real parameter, it makes perfect sense why the overloads are allowed – the parameter lists are all different.

One more example. In order to make C++ more functional, you can #include <functional>. You then get a bunch of great features, like partial application:

#include <functional>

void frobnicate(int x, int y) {}

struct Spline {
    void reticulate(int z, int w) {}
};

int main() {
    auto frobnicateWithXOfEight = std::bind_front(&frobnicate, /* x = */ 8);
    frobnicateWithXOfEight(42); // x = 8, y = 42

    Spline spline;
    auto reticulateWithXOfEight = std::bind_front( // *record scratch* what do we do here?

std::bind_front takes an existing function and creates a functor (an objectt that acts like a function, in that it can be called with brackets), “prefilling” some of the arguments starting from the left. Can we bind a member function, though? Do we use spline.reticulate or Spline::reticulate? How does it know the instance to be called on, can we provide it later?

Spline spline;
auto reticulateWithXOfEight = std::bind_front(&Spline::reticulate, spline, 8);
reticulateWithXOfEight(42); // this = spline, z = 8, w = 42?

We bind the first parameter – which is actually the implicit this. C++ hides it most of the time, but it’s still there. With EOP, we can make this far less confusing.

Check your self

Fast-forward to 2023 – the future is now, and we can do this:

class Ghost {
    int peekaboo;
public:
    void setToEight(this Ghost& self) {
        self.peekaboo = 8;
        // peekaboo = 8; // (1)
        // this->peekaboo = 8; // (2)
    }
};

The syntax is remarkably simple – insert this before the first parameter, and it becomes the instance. You can make it Ghost const&, Ghost&&, or any other allowed qualifiers. Implicit member access like in (1) is now a compile-time error in any function where the explicit this is present, and this itself cannot be used like in (2), either.

It’s rather redundant to put in Ghost as the type every time though, it’s the only type that’s valid there anyway. Can we make it a template?

class Ghost {
    int peekaboo;
public:
    template<typename T>
    void setToEight(this T&& self) {
        self.peekaboo = 8;
    }
};

When the method is called, the type T gets correctly deduced to Ghost, and everything works the same. (The && qualifier is there so that the function works for any reference type, due to C++’s reference collapsing rules. The details aren’t important here.)

If you prefer, you can also use the shorthand for templated function parameters:

class Ghost {
    int peekaboo;
public:
    void setToEight(this auto&& self) {
        self.peekaboo = 8;
    }
};

This is the form you’ll see most often in code that uses this feature.

With the type of this being explicit and controllable with a template, we can remove duplication of const/non-const versions of member functions, create recursive lambdas, or even pass this by value. More usage examples are available in Microsoft’s writeup of the feature. I recommend having a look at it before continuing to get a better grasp on EOP.

Superpowers

First, a quick primer on Rust traits. They are an interface that a class can implement – a named set of functionality. A trait called Meowable might require anything that implements it to have a meow() method. Traits can also provide a default implementation, which is used if the class doesn’t override it. Any function can now require its arguments to have a specific trait instead of locking down the type entirely. It’s a clean way of achieving polymorphism without the complexity cost of full-blown OOP. You can have a look at some examples in Rust docs.

At a glance, this seems similar to C++ inheritance – the trait is a base class, the required method is pure virtual, and the function argument’s type is a reference to the base class. There’s one major difference, though – Rust traits are entirely static dispatch. There are no vtables, no virtual calls, it’s all resolved at compile-time. Like templates, without the templates.

This could be achieved in C++ before with enough template magic, but the trick is in making it readable enough that it’s a net positive for your code. Thanks to EOP, this is now possible. Here’s a C++ trait:

class Doglike {
public:
    auto bark(this auto&& self) -> std::string requires false;
    auto wag(this auto&& self) -> std::string { return "wagwag"; }
};

A trait is just a struct that uses EOP for its function members. bark() is a function that every trait user must implement, and wag() has a default implementation so implementing it is optional. The only new bit is requires false – we’ll explain that later. It seems like we’re setting up an abstract class, but virtual is nowhere to be seen.

Now, let’s “implement” this “trait”:

class Fido:
    public Doglike
{
public:
    auto bark(this auto&& self) -> std::string { return "bark!"; }
};

We’re just inheriting from it, and implementing the required method by copypasting the signature. Let’s analyze the usage code:

Fido fido;
std::print("{}\n", fido.bark());
std::print("{}\n", fido.wag());

It shouldn’t be too difficult to see why we can call wag(). The name is not found within the class but it’s found within the base class, so that one is used. With bark(), the name is found in both, so just like with normal inheritance, name resolution prefers the deriving class’s version. So far, so good. Nothing is different from standard inheritance yet, but now it’s time to invoke static polymorphism:

void barkTwice(std::derived_from<Doglike> auto dog) {
    std::print("{} {}\n", dog.bark(), dog.bark());
}

int main() {
    Fido fido;
    barkTwice(fido);
}

The X auto param syntax is a shorthand for constraining a template argument. auto dog would match any type, std::derived_from<Doglike> auto dog matches only types T where the concept std::derived_from<T, Doglike> is true – so, only classes that “implemented” the “trait”.

With inheritance, this function would accept a reference to Doglike instead. Then, to find Fido‘s overridden definition of bark(), the call would be a virtual call – the vtable of the class would be used to call the correct function. However, because this is instead a template, the argument dog is always its real type, in this case Fido. The lookup of bark() in the function is then the same as in the earlier usage code. We could change auto to auto& to accept whatever type is passed by reference instead of copying it in.

Let’s imagine what would happen if Fido never implemented bark(). At the call site, the compiler gathers a set of all possible overloads and templates that could be used, which in this case is only the declaration in Doglike. The SFINAE process starts – all variations that result in invalid code are removed from the set. requires false is a constraint that simply fails the SFINAE test immediately, so when the declaration in Doglike is removed from the set, the set is now empty. We get a compiler error – out of all the functions that can be called, none of them are valid.

If requires false was removed, the code would actually compile! The call site would be referring to a nonexistent specialization of the template, hoping for it to be implemented somewhere else. It’s not, so the linker gives us an undefined symbol error. It would be preferable to catch this at compile-time instead, so requires false is the equivalent of = 0 to make the function “pure virtual”.

Finally, you might want to wrap the concept as syntactic sugar:

template<typename T, typename U>
concept implements = std::derived_from<T, U>;

void barkTwice(implements<Doglike> auto dog) {

There are always caveats

As often as C++ features come out of the oven half-baked, EOP comes surprisingly complete. It’s a shame that it can’t be used in constructors or destructors, but that doesn’t affect the “trait” construct. A small issue is that implementing a trait but not all of the required methods is not an error by itself – it only becomes one when the unimplemented method is called.

Another problem, which it shares with a lot of modern C++ features, is that every function involved here – both the trait member functions and the functions that require a trait – are now templates, thus, their definitions can’t be separated into a .cpp file. This problem is resolved by modules, where non-template functions and template functions behave largely the same, and the distinction disappears. But… there is a problem. As of the time of writing:

  • The only compiler that supports EOP is MSVC.
  • The only compiler that supports modules (in a usable state) is MSVC.
  • MSVC doesn’t support both EOP and modules at the same time.

😔

Hello world! Glitch showcase

Hello world! Glitch showcase

Spending long, frustrating hours writing complex graphics code and then finally seeing the results of your work come to life in front of you is the second best feeling in the world. The best feeling is, of course, admiring the flashy accidental art that happens when the code doesn’t quite work correctly. After every “mainline” post I’ll be sharing the wildest bugs I managed to capture around the same time as the events of the post. Enjoy! (Consider all videos of this series to have an epilepsy warning.)

Surprise edge detection
Whenever making any graphical change, make sure you have bloom enabled
A classic case of vertexplosion
Oh. Okay
Mind your layouts, kids
They’re trying to summon Batman
Gameplay in the shadow realm
Do not dip your blocks in acid
Almost correct tetris
Glitch-as-you-gather
I don’t know what this texture is
Every vertex’s gone to the rapture
It’s at the bottom of the well, and it’s angry
I recommend not clearing your color buffers once in a while
Qbert says Hello world!
Bloom taken to the limits of floating-point
Party in the shadow realm
The prettiest image I have ever saved

Hello world! The story so far

Hello world! The story so far

Ever since my code started producing pretty cool bugs and glitches, I have been repeatedly asked to start documenting my misadventures in computer graphics. And finally, here it as – a freshly deployed blog, a blank slate, yet to be filled with the broken hopes and dashed dreams of software development. How exciting.

Post written to the introspective ambient dub of Idlefon

The plan is to produce a write-up of every interesting topic that I research while working on the Minote project. If I keep at it, hopefully I will catch up to the present before it runs too far ahead again. So, without further ado, let’s start with the part that all things start with.

Shapely beginnings

Minote was initially created with one specific goal in mind: reimplement the arcade game Tetris: The Grand Master with an easily accessible coat of modern UX design and attractive graphics. TGM is a videogame which should absolutely not be overlooked by anyone with interest in game design, as it’s one that could legitimately be called a sequel to Tetris. The purity and elegance of the original ruleset is preserved, while refining the controls with frame-precision and expanding it with movement features that, past a certain speed, transform the classic puzzle into a fresh new challenge that extends all the way to human limits and beyond. If at any point in your life you, through no action of your own, find yourself with a copy of MAME and the ROM of the game, make sure to check it out. A community-produced version called Shiromino is also a great option.

This is still relatively beginner-level gameplay!

The community around the game was quite small – it’s hard to get people to play a game which is not only notoriously difficult, but requires non-trivial effort to even start playing. Even after dropping a few plays, it can be discouraging to hit the initial wall of rapidly rising gravity. As is standard for any arcade game, no guidance is ever provided to help the player advance further. Instead, players are expected to talk among themselves to find the techniques and strategies that allow one to eventually reach the end. This seemed to me like a problem, a fixable one – we just needed a version of the game which makes it easy to get started, and provides the encouragement and progress tracking which is expected by players of modern games. The core of the game is timeless, and only needed an updated presentation. With this in mind, I opened my IDE for the first time in years.

At that time I’ve done a small bit of hobbyist programming, but had next to no knowledge about modern graphics, and the game absolutely needed to make use of full power of the hardware to have the visuals that would interest a casual gamer. I chose to go with OpenGL, mostly because of this brilliant tutorial, which explains not only the API but the basic graphics programming techniques and the theory behind them. After a few weeks, which extended into months, a prototype was ready.

Prototype release. Shiny!

By the end the game was a nearly frame-perfect recreation of the original, with a slick presentation, particles, and glorious bloom. The road seemed clear to add more features, polish it up, and prepare for full release. If you like what you’re seeing, feel free to give it a try!

So why not?

Around this time, it started feeling like the project was taking a bit too much effort for a game that’s just a copy of something someone else made. TGM is very well designed but not perfect, and during hours of testing (and not just playing it repeatedly because it’s so addictive,) I came up with many ideas for potential changes which could make the game more intuitive, or at the very least shake things up a little. Implementing them would bring the game out of the questionably legal territory of being a clone. Considering The Tetris Company’s history of being litigation-happy and sending takedown requests to fan projects, this was not a comfortable position to be in. A new version was released which implemented some of these changes, like a unique randomizer, more intuitive and symmetrical rotation system, and further improved animations and effects. Some of the more outlandish ideas however would require changing the game on a very fundamental level.

Caption test.

Another source of issues was from the technical side. OpenGL is an API with very long history, and despite best efforts to keep it updated with changes to GPU hardware, it still has a lot of ambiguity in the API specification which results in code producing different results on different graphics drivers. Cross-vendor testing, and fixing all the bugs caused by a loose API specification and GLSL miscompilations, was starting to eat up a lot of development time. Something had to change, or I would never get out of bugfixing hell.

2nd attempt (ongoing) (actually 4th)

I’m no stranger to rewrites, and I realized they’re never as rosy as they appear. Old problems go out, new problems come in. Instead of throwing out all code, I decided to replace the renderer first, with one that could handle all the wild ideas of marrying visuals and gameplay. A properly modern one, in Vulkan, with PBR stuff and GPU-driven. It turned out that I quite enjoy graphics programming, perhaps more than I ever enjoyed making the game… It’s been almost 2 years, and the renderer is still in progress, and so many weird and interesting things came out of it. Friends in the wonderful Graphics Programming community, finding out which C++ features aren’t all that bad, computer locking up with colorful stripes flashing on the screen. Modern graphics programming feels like the comeback of writing ultra optimized code to extract the full potential of the raw hardware; there is minimal OS supervision, just you versus the GPU, a horrifying beast which can unleash incredible power should you manage to tame it. A completely different paradigm of thinking about problems, where even basic constructs like the “if” statement are executed differently on a fundamental level. It’s fascinating, it changes your thinking, and it made me rediscover the joy of programming.

But I’m rambling, and you don’t care about this. Minote is a work-in-progress renderer with some unique goals of fully procedural shading (no textures), having no visible artifacts that we typically associate with the “videogame” look of CG, and handling very high instance counts; to be eventually used for videogame and digital art projects. Now that I think about it, I could’ve started with that and spared you the barely relevant leadup. Well, I hope you like reading about tetris.