Shinya 1986
A downloadable Amiga AGA Demo
Demo details
- format: Amiga 1200 AGA single file demo.
- min. requirements: stock Amiga 1200, 2 MB of Chip RAM, no accelerator needed.
- credits: Akiko (code, music), Tsukamoto (gfx)
- download link: SEE BELOW! (the archive contains both the broken party version and the final cut)
- released at Revision 2025 (20.04.2025) - fixed version release on 08.06.2025.
- soundtrack: Shinya 1986
In 2024, at the Revision demo party, I came up with the next years strong resolution: to come to the next Revision carrying a demo production. It has been several years (5 to be precise) since we finished our last demo called Driftwood. It was an AGA production that required an accelerated Amiga (*0). Ever since then I have been meditating about where would I like to go next. Why do I like some demos more than others, and what is it that makes the heart beat faster when I think of coding? I knew that understanding how I felt about the old school demos was crucial for my well being as a demo coder (*1). After all, I am not the same person I was in 2019, let alone 1998 when we silently put Almagest to sleep.
Here is the manifest. It has only three goals -
a) the awe of watching a demo run at 50Hz, (almost) no frames skipped, on stock Amiga 1200 or 500, I don't care.
b) the cleanness in the visual domain, and not a mimic of PC demo. No blurry, smudged, jagged, distorted texture mapping, with polygon cracks and broken perspective correction.
c) that typical Amiga sound. No super long samples stitched together to mimic modern streaming music compo. I love real modules. With instruments that have character. I love the ST-xx collections.
Here is the spoiler up front: we are not able to deliver such a production yet. We are not nearly there. Even now when I am about to sum up the work done so far on the latest demo, Shinya 1986, I still see countless nights ahead of me before it becomes really good. It's a journey as long as it has always been. An emotional journey with no destination in sight.
What is Shinya 1986?
The initial idea I had for the demo was that it would be an endless loop of beautiful scenes—an homage to various cyberpunk anime from the ’80s, flirting with synth and vaporwave. With a soundtrack nostalgic, hypnotic, slowly burning and yet upbeat.
The ideal format for the demo, in both my and tsukamoto's opinion, is 1 to 2 minutes. Not more. Ideally should be loopable.
The people behind the demo are:
- Akiko (myself) : code & music
- Tsukamoto : graphics
- Gigi Beef : additional graphics (but sadly they didn't make it into the production)
Left to right: Gigi Beef, Tsukamoto & Akiko
The Demo itself is listed as Amiga AGA demo. More precise:
It's a *stock* Amiga 1200 demo, that runs at 50Hz the whole time, and as the name says: requires no additional memory (Fast RAM) nor a faster CPU than what you already own (14.07 MHz 68EC020).
The very first inspo came from the legendary City at Night shots from Akira ...
... and cars of the same milieu ...
It's always after dark. It's midnight? It's the time they call shinya. That said, the pace and the mood for the demo are set. I will just drop this amazing, moody, piece by Tsukamoto here before we continue with the creative bits.
Tooling
I'll start with the tools that were used to develop Shinya. First, the demo was written in C++ (I am myself a loyal C++20 user, a lot of constexpr, tons of lambdas, designated initialisers, auto and much more). The IDE was VS Code with the C/C++ extension for VS code developed by amazing Bartman of Abyss. Bartmann thank you from the bottom of my heart, you have made it so easy for me to switch contexts and jump into the amiga coding whenever there I had some free life cycles. Thank you, Thank you.
Another important tool was the amigeconv . I used it to convert all graphical assets for the demo. Then for pixel art both Tsukamoto and I used https://www.aseprite.org/ - another gem on the market.
For the font conversion (TTF/OTF to bitmap) I used Font2Bitmap tool found on itch.io. I had my own tool for that written in python for Driftwood, but I couldn't find the script.
Finally, the soundtrack was written in Bassoon tracker. Again, can't say how greateful I am to Steffest for this tool. It's my goto tracker (at least whenever I don't have the Polyend Tracker+ Mini around). I have been using PT2x remake on MacOS and also MilkyTracker in the past as well, however, the visual appeal and fluidity of the bassoon got me hooked.
The only thing in the demo not written by me was the p61 replay routine, and of course the system take over which I just took over from Bartman's tutorial project.
I had one jump scare just a day before the deadline. In my workflow, initially, I would take Bassoon Tracker and export the mod file, then switch to the Amiga emulator to run the P61 converter, which would bake the .p61 module. But that module sounded broken—there was some eerie buzzing and glitching, especially at the end of sample playback. It was unbearable. On the last day, still with tons of work to be done before the deadline. I was at the edge of despair at first. But luckily, I found the solution quickly: I took the module dumped by Bassoon, ran it through Protracker 2.x, resaved it as .mod, and then ran it through the P61 converter. That fixed it.
I’m aware it’s a debt, and I’ll need to take a closer look at it in the future.
Scenes
Almost all scenes in Shinya move a lot of pixels around. But there is also quite a bit of polygon and line rendering and pixel plotting. I never wrote a demo that exploited the hardware of the AGA machine. I had AGA machine (a1200) since 1994, and I was fascinated by demos like Motion, Origin, Real, Nexus7. I knew there have been several interesting novelty features introduced in AGA: 16 color playfields, sprites up to 64 pixels wide, color resolution of 8 bit per channel (instead of 4 in OCS). Lack of Fast RAM was quite a bummer, but still the 14 MHz 68020 with small yet present instruction and data caches did yield noticeable speed ups. A special mention goes to the Fetch mode register that allows bitplane DMA to execute 4 times less data fetches per scanline than before. It does come with a couple of caveats, but you can live with them when you have to.
In 2019 while working on Driftwood I did use the 24 bit color depth, but nothing more than that. Everything was brute force pixel pushing and Kalms c2p. I didn't even bother to use the fetch mode register and allow CPU more time for the c2p or color blending.
As I started working on Shinya, at first I went on to understand how to use the fetch mode register - unlocking the 64 bit bitplane and sprite fetch mode liberated dozens of cycles for the blitter and CPU to do their thing. True, some sprites were lost (at least in the automatic mode) but still, whenever I needed to blit a lot of data (or compute vertices, render lines, fill and copy polygons from scratch to the destination bitplanes) those vacant cycles were life saving. And as I said before - I wanted to have a demo that works 50Hz on a stock Amiga 1200 and still move a lot of data since the scenes envisioned were .. well kind of cineastic I if I may say so.
Then I moved on to writing tests for the dual playfields and 64 px wide sprites. It was a pleasant surprise once I realized that I could scroll two full screen bitmaps of 16 colors each and have 4 x 64 (or 2 x 128) pixel wide sprites in 16 colors each, and still have most of the cycles free for the blitter and cpu.
I will go over a couple of scenes from the demo, and lay out how they were structured. Here is the Night city scene, for example:
There is a quite some movement going on, but the frame still leaves about 90% of CPU time completely unallocated. Here is the layout:
- The background city is 4 bitplanes, playfield 2
- The building to the left is 4 bitplanes, playfield 1 (on top)
- The building to the right is a 128-pixel-wide sprite in 16 colors (uses 4 hardware sprites)
- The two street lights passing by are each 4-color sprites (also 2 hardware sprites total)
- The tall 64 pixel wide black glitchy sprite at the left border of the image on top of everything (1 hardware sprite)
True, both images and sprites don’t really use all 16 colors, but the simplified aesthetics of the demo don’t require it. We could’ve optimized some parts to use non-attached sprites, or reduced the overall bitplane count to 6—i.e., 3 per playfield—and still kept the same look and feel.
Example 2:
This next scene features a simple 3D wireframe renderer, with line clipping against the screen border, and lines rendered in 4 colors. It interleaves blitter and CPU-driven line rendering. Why not only blitter? Well, I think I found it mentally attractive to optimize the pixel plotter to use a pitch lookup table (i.e., for every row it doesn’t do the costly y * bytes_per_row, but simply looks it up in a precomputed array). Also, for setting the correct bit in the byte, it uses a small lookup table—I found it somewhat faster than shifting the respective amount of bits each time. Just to be clear, I am aware that the blitter would have done the job faster, I measured it on stock A1200, but this was more fun.
At any rate, here’s how it works:
- it starts by clearing the necessary portion of the screen (3 bitplanes) from the previous frame using the blitter
- while blitter is busy clearing, we compute the 3d to 2d point projection for the quads
- then for the 1st line we compute the line clipping against the screen borders
- and invoke the blitter line drawing routine
- while blitter is busy, it computes the next clipped line
- and draws it using cpu (fast bresenham, optimized with the use of LUTs for the pitch computation)
- and then repeats the process for all the lines
To design vectors like this I usually either run TIC-80 to quickly prototype the idea, or I use a sandbox project (raylib, C++) with a 320x180 bitmap and a couple of pixel plotting routines. The benefit of the latter is that one can use the same fixed point math lib that is used later on in the Amiga project as well.
What went wrong
Well, Revision 2025 was dope. And very intense.
We didn’t manage to wrap up the whole demo before heading to the party. Some 30% of the demo was still WIP before packing up. Sleep was scarce in the preceding days, and since this was my first AGA-coded demo that relied more on banging the hardware than brute-force CPU work, I had a lot of trial and error—a lot of things to understand and build intuition around. Especially around palette offsetting for sprites and dual playfields, and sprite/playfield priority configurations, which were changing from scene to scene.
It’s not a big demo, but it does have a lot of custom code, and each scene builds up differently. That, plus heavy lack of sleep, the travel, the excitement, the desire to finish it—all triggered some of the typical mistakes. The sync was broken. While everything worked fine on the accelerated Amiga (real hardware) and in the emulator, issues popped up on the stock A1200. Then fixing those issues caused mysterious things to happen on accelerated hardware, but not in emulation.
In the end, since the demo was AGA, it was to be run on the 060-accelerated compo machine. At the last minute, we decided to fix what we could and remove the troublesome parts. Tsukamoto (the artist) had his A1200 + TF060 with him, so we were able to copy the demo over a couple of times and iron out all the critical bugs.
And in the end, the demo did run. It was crippled, with broken pacing in some scenes and a couple of glitches when switching from scene to scene—but it ran.
After the party, we knew the first thing to do was: finish the demo the right way. So, here we are.
Misc notes regarding the code
In general, most of the movable parts on screen have some kind of easing applied. I used fixed-point arithmetic most of the time for this. The original library from Driftwood has been reworked a bit for convenience, and some functions—like division in debug mode—check against division by zero. If that happens, a __builtin_trap() is called, with a preceding KPrintF("error: division by zero") debug output.
Most of the math library is constexpr, so if you provide constant data, you get all the calculations for free—or at comp-time.
As for the 3D part, I did write a full fixed-point (precision is configurable via template system) matrix library with all the common operations, but I ended up not using it. As correct and versatile as it is, it was overkill for the simplistic geometry the demo renders, and in most cases slower due to doing way more multiplications than the demo actually needed. (*2)
My thoughts of A1200
When A1200 came out in late 1992, its default (stock) hardware was indeed disappointing to most of us. The basic version even came without internal HD. I owned A2000 at that time, and switched to A1200 summer the following year, thanks to my brother Gigi Beef - he smuggled it for me from Germany, with 120 MB hard-disk on board. And then the next year Blizzard 1220/4. But that year, that one year, I've spent with the stock A1200 was already overly inspiring in the creative sense. The fact alone that the stock A1200 could run demos like Origin, Motion, Nexus 7, Real, Crazy Sexy Cool and many many more demoscene gems I believe deserves a stand alone category, just like OCS, and it could be called something like unaccelerated AGA. Or whoever wants to coin a better name - you are welcome.
Footnotes
*0 Preferably 68030 @ 50 MHz)and at least 6 MB of RAM, at least. I will write more about Driftwood in a separate article.
*1 Let's stop for a moment and reflect on the previous sentence. As a passionate demo coder I am investing precious, scarce, hours in reading hardware reference manual, chasing down timing bugs, struggling with the real hardware, coding over and over complicated blitter actions and weird copper lists, developing auxiliary tools and scripts to generate and convert data to the embedded demo assets, and much more. It's a lot of details. All of that running in parallel with my everyday life and professional career as a software engineer, a husband and a father. So, lets make it worth while.
*2 The good thing with relying on the type system in C++ is that you can conveniently work with your fixed-point type if you overload the operators. It happens from time to time, as one writes code, that they forget some of the function parameters are fixed-point and not int32, int16. This is quite a common mix-up mistake as the codebase grows. You end up asking yourself the same question over and over again—oh wait, is this a fixed-point parameter or a real value?
Aside from imposing this strict type system approach, one can always resort to nesting the fixed-point functions in their own namespace, or naming them strictly following a certain convention, etc.
Sure, there are multiple solutions to the problem—but in my opinion, the fear of strict typing using classes and operator overrides is heavily rooted in the infamous past of the C++ compilers. In the early 2000s, C++ and OOP of any kind were to be avoided at all cost, as compilers were often generating lousy code—sometimes 10–15% slower than their pure C, flat counterparts. Today, when in doubt, just head over to Compiler Explorer and let the facts talk.