Porting Chuckie Egg

Contents

Introduction

Having poured effort into DHTML Chuckie Egg, the result was still only an approximation of the original BBC Model B game. Of all the remakes/versions available today, only two are truly perfect replicas.

The inaccessibility of the RISC OS version to Windows users has meant that Mike Elson's remake has become the dominant choice, despite several inaccuracies from the original. Re-porting Michael Foot's RISC OS remake to Windows/Linux was the path of least resistance in establishing definitive Windows/Linux versions. The result is the first native version of Chuckie Egg and it weighs in at a tiny 19kb (on Windows).

The RISC OS Port

The BBC Model B's MOS and the Archimedes RISC OS share a lot in common. For example, most of the VDU operations and sound commands are identical on both platforms. This simplified the task of porting Chuckie Egg between the BBC B and Acorn Archimedes. Unfortunately, this is a luxury that Windows doesn't have, so the main benefit of taking the RISC OS port as a starting point was its conversion to ANSI C.

In essence, Michael has disassembled the original Chuckie Egg binary and written a mini-simulator in C to simulate the disassembled instructions. The disassembly has been analysed for subroutines e.g. JSR/RTS pairs and they have been converted into C functions. Loops were expressed as ANSI C goto statements with appropriate labels. All operating system calls have been replaced with the corresponding RISC OS equivalents (which are basically the same) and he intercepts all memory accesses to redirect them into arrays of data. All game data is embedded using ANSI C char arrays arranged into code pages.

When compiled, the resulting binary is completely self contained and faster than an emulated version because there are no jump tables being used to interpret instructions. It mirrors every nuance of the original game right down to the flickery update and above all it is tiny.

The Native Version

My aim was to bring a 100% faithful, self contained BBC Model B version of Chuckie Egg to Windows/Linux and get as close to the size of the original BBC binary as possible (9.7kb). This meant taking the work Michael Foot had done with the RISC OS port and implementing the following:

I placed all this functionality inside BBCB.lib which was linked separately with a view to using it for other ports in the future.

SDL (Simple DirectMedia Layer)

Initially, I used SDL to help port Chuckie Egg. This provided me with a cross-platform interface to video, audio and keyboard subsystems. It was a help getting started but it significantly bloated the size of the executable. SDL required a 252kb DLL to be loaded by the exe and introduced dependencies on the C runtime. In the interests of keeping a cross-platform capability, the SDL version is still available for download. It comes packaged with a copy of SDL 1.2.11, but you will need to get the equivalent packages in order to compile/run it under Linux. Newer versions of SDL will probably work too.

There were a few differences between the performance of SDL under Windows and Linux. Calling SDL_UpdateRect() on Linux is very slow. I had to aggregate all screen updates into a single full-screen SDL_UpdateRect() to maintain performance with the loss of the authentic flickery update of the original BBCB version, but this was a small price to pay.

SDL is available for Mac OS too and I hope to look a providing a pre-compiled binary for Macs in the near furture.

Having completed the SDL version, I concentrated my efforts on a native windows version because of the size/dependency issues of SDL. It turned out that the native windows version was very simple to implement. For example, writing the setup code was about as complex as the SDL setup code and there was a strong one-to-one correspondence between the two approaches.

Tiny Test Harness

As a stepping stone towards replacing the SDL, I implemented a tiny test harness. This still exists as a separate project and can be downloaded here: Tiny Test Harness. The basic idea was to use the APIs built into windows (so as to reduce dependencies) to implement an audio system, video system and user input mechanism.

Video Subsystem

The video subsystem for Chuckie Egg centres on simulating BBC MODE 2 support. This is a 320x256, 4bpp (bits per-pixel) display mode where the bits of each adjacent pair of pixels are interleaved. This display mode was memory mapped between address 0x3000 and 0x8000, so the first step was to intercept any values poked into this area by Chuckie Egg and redirect them to write into a bitmap suitable for displaying in a window. Firstly, I created a 'FrameBuffer' which was a simple 8bpp DIB section.

void CreateFrameBuffer(HWND hWnd)
{
    HDC hdc;
    HBITMAP hBitmap;
    BITMAPINFO bmi;
    void* bits;

    bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER); 
    bmi.bmiHeader.biWidth = dwWidth; 
    bmi.bmiHeader.biHeight = -(INT)dwHeight; 
    bmi.bmiHeader.biPlanes = 1; 
    bmi.bmiHeader.biBitCount = 8; 
    bmi.bmiHeader.biCompression = BI_RGB; 
    bmi.bmiHeader.biSizeImage = 0; 
    bmi.bmiHeader.biXPelsPerMeter = 0; 
    bmi.bmiHeader.biYPelsPerMeter = 0; 
    bmi.bmiHeader.biClrUsed = 0; 
    bmi.bmiHeader.biClrImportant = 0; 

    hdc = GetDC(hWnd);
    hBitmap = CreateDIBSection(hdc, &bmi, DIB_PAL_COLORS, &bits, 0, 0);

    hMemDC = CreateCompatibleDC(hdc);
    ReleaseDC(NULL, hdc);

    SelectObject(hMemDC, hBitmap);
}

Next, I handled the WM_PAINT message to blit the bitmap into the window - having using InvalidateRect() on the regions of the display updated by Chuckie Egg.

    case WM_PAINT:
        GetUpdateRect(hWnd, &rect, 0);
        hdc = BeginPaint(hWnd, &ps);
        BitBlt(hdc, 
               rect.left, rect.top, 
               rect.right-rect.left, rect.bottom-rect.top, 
               hMemDC, 
               rect.left, rect.top, 
               SRCCOPY); 
        EndPaint(hWnd, &ps);
        break;

To obtain the same set of colours as MODE 2 (black(0), red(1), green(2), yellow(3), blue(4), magenta(5), cyan(6) and white(7)), I initialised the palette of the DIB section using the following function.

void SetPalette(int startIdx, int numIdxs, RGBQUAD* cols)
{
    SetDIBColorTable(hMemDC, startIdx, numIdxs, cols);
}

Audio Subsystem

I decided to simulate the BBC Basic 'SOUND' and 'ENVELOPE' commands directly, rather than trying to implement the sound chip itself. The BBC Micro Advanced User Guide gives you all the information you need in order to achieve this and the ordinary BBC Model B User Guide has plenty of examples that can be used as audible unit tests. My implementation goes way beyond Chuckie Egg's audio requirements and can be used in other games.

The BBC Basic 'SOUND' command is packed with functionality. You have four channels to manipulate, with control over pitch, amplitude, duration and timing. An internal queue handles multiple sound requests and there are dedicated white noise generators. The envelope command gives you independent ADSR (Attack, Decay, Sustain, Release) setups for both pitch and amplitude.

All waveforms generated by the BBC Model B sound chip are square, giving the Beeb a rather harsh sound. It organises pitches into quarter semi-tone increments and has space for 256 possible pitches. This gives us a workable range of slightly over five octaves (12 semitones per octave). The BBC Micro Advanced User Guide does not allude to the exact frequencies used so I had to guess. The BBC Model B probably used integer frequencies to simplify coding, so I decided to generate a table of pitches in quarter semi-tone increments, rounding down each pitch to the nearest whole integer. I found a useful article describing how to calculate pitch for a given note in the scale called 'The Equal Tempered Scale and Some Peculiarities of Piano Tuning'. I coded up the calculation outlined at the bottom of the article to generate a pitch table.

#include <stdio.h>
#include <math.h>

int pitchLookup[256];

void main()
{
    int middleA = 89;
    float middleAFrequency = 440.0f;
    int i;

    // Doesn't really need to be floating point. 
    for(i = 0; i < sizeof(pitchLookup)/sizeof(pitchLookup[0]); ++i)
    {
        pitchLookup[i] = (int)(middleAFrequency * pow(2, (i - middleA)/(4.0f*12.0f)));
	printf("%d, ", pitchLookup[i]);
    }
    printf("\n");
}

For white noise, I hunted around for an integer pseudo random number generator and just poked its output into the sound buffer. Changing frequency of white noise was just a matter of poking white noise into the audio buffer more often for notes of higher pitch.

The underlying implementation uses Microsoft DirectSound which has been available as a built-in windows component since Windows 2000. The setup was very simple:

    WAVEFORMATEX wfmt;
    LPDIRECTSOUND pds;

    wfmt.nSamplesPerSec = 16000; /* sample rate */
    wfmt.wBitsPerSample = 8; /* sample size */
    wfmt.nChannels = 1; /* channels*/
    wfmt.cbSize = 0; /* size of _extra_ info */
    wfmt.wFormatTag = WAVE_FORMAT_PCM;
    wfmt.nBlockAlign = 1; /* (wfmt.wBitsPerSample >> 3) * wfmt.nChannels; */
    wfmt.nAvgBytesPerSec = 16000; //wfmt.nBlockAlign * wfmt.nSamplesPerSec;

    /* Create IDirectSound object */
    if(DirectSoundCreate(NULL, &pds, NULL) != DS_OK)
        return FALSE;

This code initialises the DirectSound device specifying a maximum frequency of 16khz and requests 8bit audio samples. The Nyquist criteria states that a sampled audio signal can only perfectly be reconstructed from the samples if the sampling rate exceeds two times the highest frequency in the original signal. Since 4836Hz is the highest pitch supported by the BBC Model B, it is necessary to pick a playback frequency greater than or equal to 4836 x 2 = 9672Hz. 8Khz (phone quality) audio would be insufficient, but 16khz would do nicely.

One of the BBC Model B's 'SOUND' command parameters doubles up as both a volume control and a way of selecting an envelope. The value is only 8bit so it uses values between 0 (off) and 15 (loud) to control volume. 5bits would have been required to represent the corresponding waveform i.e values in the range -15 to + 15 but the smallest sample width supported by DirectSound is 8bit. Although the BBC Model B's internal sound generator has only 16 amplitude levels the software was upward compatible with a generator having 128 levels. Again, 8bits would scope this perfectly.

I elected to mix the four BBC Model B sound channels together in software. In order to do this, I had to create a primary sound buffer to write into:

    DSBUFFERDESC dsbd; /* Create sound buffer */
    
    dsbd.dwSize = sizeof (DSBUFFERDESC);
    dsbd.dwFlags = 0;
    dsbd.dwBufferBytes = BUFFERSIZE; /* 1024 bytes */
    dsbd.dwReserved = 0;
    dsbd.lpwfxFormat = &wfmt;
    dsbd.guid3DAlgorithm = GUID_NULL;
    if(IDirectSound_CreateSoundBuffer(pds, &dsbd, &pdsb, NULL) != S_OK)
        return FALSE;

To keep the buffer refreshed with data, I used a multi-media timer call-back. The call-back needs to be serviced regularly enough to ensure that the buffer never gets empty. The buffer size is 1024 bytes, which equates to about 64 milliseconds of playback at a sample rate of 16khz. The timer interval for servicing the sound buffer should be no more than half of this (32ms) otherwise we would only be filling the buffer when it had already run out of samples and this could lead to audio artefacts. The problem is compounded by the fact that the timers in Windows are only accurate to within a given resolution and specifying a high resolution increases system load. Latency is another factor in how often to update the buffer. Chuckie Egg updates at 50Hz, which means that new sounds can be triggered within 20ms. Eventually, I picked a timer with 1ms resolution and used the following calculation to arrive at a timer interval.

int interval = (BUFFERSIZE * 1000) / (AUDIO_PLAYBACK_FREQUENCY*8);

Rather than just dividing by two to satisfy the requirements of the buffer, I ended up dividing by eight. This seemed sufficient to accommodate all the factors that could introduce buffer-fill-failure artefacts.

    /* Create multimedia timer */
    nIDTimer = timeSetEvent((BUFFERSIZE * 1000) / (AUDIO_PLAYBACK_FREQUENCY*8), 
                            1, TimeProc, (DWORD)userCallBack, TIME_PERIODIC);
    if(nIDTimer == 0)
        return FALSE;

MAME style CPU friendly cooperative multi-tasking.

You wouldn't expect Chuckie Egg to consume much CPU on a modern PC, but it is so easy to get this wrong. For example, the BBC Model B emulator and other Chuckie Egg remakes seem to consume a fair chunk of processing time. The secret is to only take the CPU when you absolutely need it, but always wake the thread early enough to maintain an accurate 50Hz. MAME successfully tackles this problems using the technique described on Larry Bank's programmer's corner website.

To summarise; if the next 50hz step is more than two milliseconds away, then the thread is put to sleep and only woken for the last millisecond. During this last millisecond the thread spins in a loop until the exact overall duration has elapsed. The game can then continue with its next update. This method ensures the process is asleep for the bulk of the time. The following code snippet is taken from the Chuckie Egg Port.

/* Implements a delay in miliseconds. This will put the thread to sleep if it can. */
void delay(unsigned int milliseconds)
{
    /* Code taken from Larry Bank's programmers corner website: http://www.bitbanksoftware.com/Programmers/code4.html */
    unsigned int counter = Clock() + milliseconds;
    if(milliseconds > 2)
    {
        sysSleep(milliseconds - 1); /* Sleep through most of it then spin for the last millisecond. */
    }

    /* Spin until 'milliseconds' have elapsed. */
    while(Clock() < counter)
    {
    }
}

One small gotcha is that clock() under Linux has a different value of CLOCKS_PER_SEC than Windows, so the calculations need to take this into account. Also, repeatedly calling clock() seems to cause problems for Linux resulting in the game hanging and stuttering. I switched to using SDL_GetTicks() and everything seemed fine.

Reducing .exe size

It was always my goal to see how close I could get the port to the size of the original BBC model B binary. There are several things contributing to the executable size that were not an issue on the original BBC B:

Despite the obvious overheads faced by the windows port I managed to generate a stand-alone executable of just 49kb. This was mostly achieved through careful choice of compiler settings to optimise for size, dead-strip code etc. An article called Techniques for reducing Executable size was an invaluable guide through this process. One of the main recommendations of this article is removing the dependency on the C run-time library.

The C Runtime Library initialisation code together with the baggage of the CRT itself adds significantly to the size of the exe. I didn't want to cheat either by using the DLL version of the C run-time library. By manually re-writing the CRT methods used by Chuckie Egg, I was able to remove the dependency on the CRT library entirely. I also ensured that the port avoided floating point operations so that floating point support module didn't need to be linked in.

In the end, my executable was approximately five times as big as the BBC Model B version. This is substantially smaller than any other Chuckie Egg remake for windows, but I had one final touch to add - I decided to use an executable compressor/packer and after digging around, I settled on kkrunchy. kkrunchy is invoked as follows:

kkrunchy.exe --best ch-egg.exe

Here is the output:

kkrunchy 0.23 alpha >> radical exe packer (c) f. giesen 2003-2005

 - using best packer frontend
 - program database successfully loaded
 - preprocessing, filtering & reslicing
 - packing [################################] => 18437 bytes (in 0.50s)
 - WARNING: Out ImageBase 0x3f0000 lower than 0x400000, won't run under Win9x
 - writing output file ch-egg.exe
 - packed executable 50176 -> 19456 bytes
 - writing pack ratio analysis ch-egg.kkm

The final size was 19kb, just twice as big as the original BBC Model B version. I was finally happy.