Javascript RLE

Contents

Introduction

The Chuckie Egg level editor stores new level designs within cookies. There are specific storage limits associated with cookies outlined as follows:

"If you use the document.cookie property to retrieve the cookie on the client side, the document.cookie property can retrieve only 4,096 bytes. This byte total can be one name-value pair of 4 KB, or it can be up to 20 name-value pairs that have a total size of 4 KB."

Unfortunately, each level of Chuckie Egg is described as a grid of 20 x 28 cells and there are 8 levels in total. If one ASCII character is used per cell, 4.375kb is required and that doesn't include the 'name' part of the cookies' name-value pair or data pertaining to cookie expiry.

I needed to find an efficient mechanism for compressing the data which could still be encoded as valid ASCII data (as required by cookies). In the end I settled upon RLE (Run Length Encoding) coupled with careful limits on the maximum run so as to keep within the ASCII character range.

RLE

Run length encoding is simple to implement, and efficient to decompress. It particularly suits encoding Chuckie Egg levels because they are comprised of large spans of identical cells such as walls and empty areas. The encoding algorithm outputs pairs of values - the cell value and the number of times it occurs in a contiguous sequence.

Download reference Javascript RLE source code (.zip).

ASCII encoding

The list of RLE sequence (value, count) pairs from the encoding process needs to be stored in the cookie as an ASCII text string because cookies are interpreted as text files. So, it is perfectly reasonable to write out a string such as:

"rle_level1=0,3,2,1,1,4,5,2,0,3"

but this could be made much tighter by combining the (value, count) pair into a single number represented with an ASCII code rather than a string e.g. ASCII code 72, rather than "72".

Unfortunately, we are limited to the visible ASCII character codes in the range 33 - 126 inclusive. All characters outside this range must be escaped using the Javascript escape() function. Since a value such as 30 would be escaped as the three ASCII codes "%1E", this triples the storage required - not good when we are trying to reduce the amount data used.

By making some careful observations about the likely values of the encoded (value, count) pairs, it is possible to ensure that the resulting ASCII character values do indeed lie within the required range. Firstly, there are five cell types to deal with (blank(0), wall(1), grain(2), egg(3) and ladder(4)). Of these five types, only blank, wall and grain are likely to occur in nice RLE sequences. The remaining cell types typically occur as "one-offs" rather than in sequences.

If we take the ASCII range 33 - 126 inclusive (which equates to 94 unique characters) and allocate the first two character codes (33 and 34) to represent these "one-offs", it leaves us with 92 values with which to encode the sequences. ';' is the delimiting character used to separate multiple name/value pairs within a cookie so we cannot use it. By mapping code '59' (the ASCII code for ';') to ASCII code 35, we can side step the problem without having to escape the string. This now leaves us with 91 values with which to represent our RLE sequences.

We know there are three types of sequence (blank, wall and grain), so we can have 91/3 = 30 sequences of different lengths. It turns out that the width of a level is 28 cells, so we can encode at least an entire row with a single ASCII character. Doing this requires a modification to the RLE algorithm to restart a sequence if it reaches 30 elements in length. It is also necessary to ensure that "one-off" elements never form sequences.

Below is the code used to encode a stream of RLE data pairs (rleValue, rleSequenceLength) as a stream of ASCII codes.

function asciiEncode(data)
{
	var str = "";
	for(var i = 0; i < data.length; i+=2)
	{
		var rleValue = data[i];
		var rleSequenceLength = data[i+1];

		if(rleValue < 3) // Sequences are captured for blanks, walls or grains
			var asciiCode = 36 + (rleValue * 30) + (rleSequenceLength - 1);
		else // remaining cell types (3 and 4) are treated as one-offs
			var asciiCode = 33 + (rleValue - 3);

		// Special case for eliminated semi-colons
		if(asciiCode == 59)
			asciiCode = 35;

		str += String.fromCharCode(asciiCode);
	}
	return str;
}

And to decode it again:

function asciiDecode(str)
{
	var result = new Array;
	for(var i = 0; i < str.length; ++i)
	{
		var asciiCode = str.charCodeAt(i);

		// Special case handling for semi-colons.
		if(asciiCode == 35)
			asciiCode = 59;

		if(asciiCode > 35)
		{
			// It is a blank, wall or grain cell.
			var rleValue = Math.floor((asciiCode - 36) / 30);
			var rleSequenceLength = ((asciiCode - 36) % 30) + 1;
		}
		else
		{
			// It is a 'one-off' cell such as egg, ladder, lift etc.
			var rleValue = (asciiCode - 33) + 3;
			var rleSequenceLength = 1;
		}
		result[i*2] = rleValue;	
		result[i*2 + 1] = rleSequenceLength;
	}
	return result;
}

Results

Prior to applying the RLE strategy, approximately 6kb was required in order to store all eight Chuckie Egg level layouts. After applying the algorithm and combining all layouts into a single name-value pair, the resulting size was approximately 1.6kb. This leaves plenty of room for storing a high score table.

Usage

The full source code is listed below and can be embedded into a web page but you will need to remove the unit test code at the bottom because the WScript object is not available from within a web browser. The program listed below runs from the windows command prompt too as follows:

cscript //nologo rlejs.js

It encodes the sequence of numbers and then decodes it again printing the results at each stage to stdout. Of course, the input data can be anything you like.

Listing

/*
  ch-egg-rlejs.js - Javascript implementation of RLE encode/decode 
                    combining ASCII encoding for Chuckie Egg levels.
  Copyright (C) 2009 Mark Lomas

  This program is free software; you can redistribute it and/or
  modify it under the terms of the GNU General Public License
  as published by the Free Software Foundation; either version 2
  of the License, or (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program; if not, write to the Free Software
  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
*/

// RLE decompression reference implementation
function rleDecode(data)
{
	var result = new Array;
	if(data.length == 0)
		return result;

	if((data.length % 2) != 0)
	{
		alert("Invalid RLE data");
		return;
	}

	for(var i = 0; i < data.length; i+=2)
	{
		var val = data[i];
		var count = data[i+1];
		for(var c = 0; c < count; c++)
			result[result.length] = val;
	}
	return result;
}

// RLE compression reference implementation
function rleEncode(data)
{
	var result = new Array;
	if(data.length == 0)
		return result;

	var count = 1;
	var r = 0;
	for(var i = 0; i < (data.length - 1); i++)
	{
		// If contiguous sequence ends, or we encounter a ladder/egg
		// or the sequence reaches 30 elements in length, terminate sequence.
		if(data[i] != data[i+1] || data[i] >= 3 || count == 30)
		{
			result[r] = data[i];
			result[r+1] = count;
			count = 0;
			r +=2;
		}
		count++;
	}
	result[r] = data[i];
	result[r+1] = count;

	return result;
}

function asciiEncode(data)
{
	var str = "";
	for(var i = 0; i < data.length; i+=2)
	{
		var rleValue = data[i];
		var rleSequenceLength = data[i+1];

		if(rleValue < 3) // Sequences are captured for blanks, walls or grains
			var asciiCode = 36 + (rleValue * 30) + (rleSequenceLength - 1);
		else // remaining cell types (3 and 4) are treated as one-offs
			var asciiCode = 33 + (rleValue - 3);

		// Special case for eliminated semi-colons
		if(asciiCode == 59)
			asciiCode = 35;

		str += String.fromCharCode(asciiCode);
	}
	return str;
}

function asciiDecode(str)
{
	var result = new Array;
	for(var i = 0; i < str.length; ++i)
	{
		var asciiCode = str.charCodeAt(i);

		// Special case handling for semi-colons.
		if(asciiCode == 35)
			asciiCode = 59;

		if(asciiCode > 35)
		{
			// It is a blank, wall or grain cell.
			var rleValue = Math.floor((asciiCode - 36) / 30);
			var rleSequenceLength = ((asciiCode - 36) % 30) + 1;
		}
		else
		{
			// It is a 'one-off' cell such as egg, ladder, lift etc.
			var rleValue = (asciiCode - 33) + 3;
			var rleSequenceLength = 1;
		}
		result[i*2] = rleValue;	
		result[i*2 + 1] = rleSequenceLength;
	}
	return result;
}

// Perform unit test

var test = [0,0,0,1,1,1,2,2,2,1,3,3,3,3,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1];
WScript.StdOut.WriteLine("Original");
for(var i = 0; i < test.length; i++)
	WScript.StdOut.Write(test[i] + ",");

var result = rleEncode(test);

WScript.StdOut.WriteLine("\nEncoded");
for(var i = 0; i < result.length; i++)
	WScript.StdOut.Write(result[i] + ",");

WScript.StdOut.WriteLine("\nASCIIEncoded");
var asciEncodedStr = asciiEncode(result);
WScript.StdOut.WriteLine(asciEncodedStr);

WScript.StdOut.WriteLine("\nASCIIDecoded");
var asciiDecoded = asciiDecode(asciEncodedStr);
for(var i = 0; i < asciiDecoded.length; i++)
	WScript.StdOut.Write(asciiDecoded[i] + ",");

WScript.StdOut.WriteLine("\nDecoded");
var decoded = rleDecode(asciiDecoded);
for(var i = 0; i < decoded.length; i++)
	WScript.StdOut.Write(decoded[i] + ",");

if(decoded.length != test.length)
{
	WScript.StdOut.WriteLine("\nUnit test failed - decoded sequence is longer than original sequence");
}
else
{
	for(var i = 0; i < decoded.length; i++)
	{
		if(decoded[i] != test[i])
		{
			WScript.StdOut.WriteLine("\nUnit test failed - decoded sequence is different ot the original sequence");
			WScript.Quit(1);
		}
	}
	WScript.StdOut.WriteLine("\nUnit test succeded"); 
}
WScript.Quit(0);