Javascript LZW

Contents

Introduction

I was interested in compressing the DHTML version of Chuckie Egg to get it closer to the original size of the BBC Model B version. After applying a Javascript compressor which stripped white space and shortened variable names, I was left with a 54kb source file, but I wanted to go further. I hit upon the idea of using a self-decompressing version of the source file similar in nature to Windows executable compressors such as Kkrunchy. The idea was to embed compressed data and a decompression function into the Chuckie Egg HTML page that would decompress the game on load into a string. This string would then be passed to the Javascript 'eval()' function to be executed.

The ECMA/JavaScript "Compact Profile" allows dynamic features of the language to be optional (such as 'eval()') which means this method approach might not be suitable for some resource constrained devices.

LZW

The LZW compression/decompression algorithm patent expired in 2004 and it is relatively easy to implement in Javascript. This is partly due to the fact that you get free hash table implementations in Javascript and string handling is built-in. Wikipedia has a great description of the LZW algorithm which I followed to the letter in order to implement this version.

Download Javascript LZW source code (.zip).

Results

This crunched Chuckie Egg down to from 54kb to 22kb, but I hit a problem. In order to retrieve the compressed data from a server, it would either have to be transmitted as a binary data stream or uuencoded/yEnc/base64 encoded. The encoding methods introduce an overhead which in the case of uuencode bloats the compressed data by 33 percent back to approximately 30kb. Transmitting the data as a binary file was out of the question too because technologies like AJAX don't directly support binary - only text/XML are supported properly. Of course, if your web hosting package supports compressed HTML, then you don't really have anything to worry about, but improving download speed was not the intention here. I wanted to reduce the size of Chuckie Egg as a purely intellectual exercise.

Optimisation

I have not made any effort to optimise or profile the Javascript - it is purely a reference implementation, aimed at readability and learning. Consequently it takes around nine seconds to decompress Chuckie Egg. The bit packing has been written in a particularly dumb (slow) fashion, the dictionary/stream objects just add function call overhead reducing performance further and I haven't run a Javascript optimiser on it. However, with these improvements, I would hope to get the decompression down to around a couple of seconds.

Usage

The full source code is listed below and can be embedded into a web page but you will need to replace the WScript.StdOut.WriteLine() call (which is in there for debugging) with an eval(decomp) instruction to execute the decompressed script. Also, you probably only want the decompression code, i.e. just the LZWDecompressor() and InStream() object/function implementations.

The program listed below runs from the windows command prompt too as follows:

cscript //nologo lzwjs.js

The program compresses the string "TOBEORNOTTOBEORTOBEORNOT#" and decompresses it again printing the result to stdout. Of course, this string could be anything you like. For example, I used the following snippet of code to load in ch-egg.js and compress it:

function reportError(err) {
	return("Exception raised " + 
		"\n==================" +
		"\nReason: " + err.description + 
		"\nErrno: "	+ err.number + "\n");
}

function LoadFile(filename)
{
	var s = "";
	try	
	{
		var fso = new ActiveXObject("Scripting.FileSystemObject");
		ts = fso.OpenTextFile(filename, 1); // Open read only.
		s = ts.ReadAll();
	}
	catch(Error) 
        {
	   print(reportError(Error));
	}
	return s;
}

var output = new OutStream();
var compressor = new LZWCompressor(output);

var input = LoadFile("ch-egg.js");
compressor.compress(input);

Listing

/*
  lzwjs.js - Javascript implementation of LZW compress and decompress algorithm
  Copyright (C) 2009 Mark Lomas

  This program is free software; you can redistribute it and/or
  modify it under the terms of the GNU General Public License
  as published by the Free Software Foundation; either version 2
  of the License, or (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program; if not, write to the Free Software
  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
*/

// Used to write values represented by a user specified number of bits into 
// a 'bytestream' array.
function OutStream()
{
    this.bytestream = new Array();
    this.offset = 0;

    this.WriteBit = function(val)
    {
       	this.bytestream[this.offset>>>3] |= val << (this.offset & 7);
        this.offset++;
    }

    this.Write = function(val, numBits)
    {
        // Write LSB -> MSB
        for(var i = 0; i < numBits; ++i)
            this.WriteBit((val >>> i) & 1);
    }
}

// Used to read values represented by a user specified number of bits from 
// a 'bytestream' array.
function InStream(bytestream, bitcount)
{
	this.bytestream = bytestream;
	this.bitcount = bitcount;
	this.offset = 0;

	this.ReadBit = function()
	{
	    var tmp = this.bytestream[this.offset>>>3] >> (this.offset & 7);
	    this.offset++;
	    return tmp&1;
	}

	this.Read = function(numBits)
	{
	    if((this.offset + numBits) > this.bitcount)
	        return null;

	    // Read LSB -> MSB
	    var val = 0;
	    for(var i = 0; i < numBits; ++i)
	        val |= this.ReadBit() << i;

	    return val;
	}
}


function LZWCompressor(outstream)
{
        this.output = outstream;

	// Hashtable dictionary used by compressor
	this.CompressDictionary = function() 
	{
	    this.hashtable = new Object();
	    this.nextcode = 0;

	    // Populate table with all possible character codes.
	    for(var i = 0; i < 256; ++i)
	    {
	        var str = String.fromCharCode(i);
	        this.hashtable[str] = this.nextcode++;
	    }    


	    this.Exists = function(str)
	    {
	        return (this.hashtable.hasOwnProperty(str));
	    }

	    this.Insert = function(str)
	    {
	        var numBits = this.ValSizeInBits();
	        this.hashtable[str] = this.nextcode++;
	        return numBits;
	    }

	    this.Lookup = function(str)
	    {
	        return (this.hashtable[str]);
	    }

	    this.ValSizeInBits = function()
	    {
	        // How many bits are we currently using to represent values?
	        var log2 = Math.log(this.nextcode + 1)/Math.LN2;
	        return Math.ceil(log2);
	    }
	};


	// LZW compression algorithm. See http://en.wikipedia.org/wiki/LZW
	this.compress = function(str)
	{
	   var length = str.length;
	   if(length == 0)
	       return output.bytestream;

	   var dict = new this.CompressDictionary();
	   var numBits = dict.ValSizeInBits();
	   var w = "";
	   for(var i = 0; i < length; ++i)
	   {
	       var c = str.charAt(i);
	       if(dict.Exists(w + c))
	       {
	           w = w + c;
	       }
	       else
	       {
	           numBits = dict.Insert(w + c);
	           this.output.Write(dict.Lookup(w), numBits); // Looks-up null on first interation.
	           w = c;
	       }
	   }
	   this.output.Write(dict.Lookup(w), numBits);
	};

} // end of LZWCompressor

function LZWDecompressor(instream)
{
	this.input = instream;

	this.DecompressDictionary = function()
	{
	    this.revhashtable = new Array();
	    this.nextcode = 0;

	    // Populate table with all possible character codes.
	    for(var i = 0; i < 256; ++i)
	    {
	        this.revhashtable[this.nextcode++] = String.fromCharCode(i);
  	    }

	    this.numBits = 9;

	    this.Size = function()
	    {
	        return (this.nextcode);
	    }

	    this.Insert = function(str)
	    {
	        this.revhashtable[this.nextcode++] = str;

	        // How many bits are we currently using to represent values?
		// Look ahead one value because the decompressor lags one iteration
		// behind the compressor.
	        var log2 = Math.log(this.nextcode + 2)/Math.LN2;
	        this.numBits = Math.ceil(log2);
	        return this.numBits;
	    }

	    this.LookupIndex = function(idx)
	    {
		return this.revhashtable[idx];
	    }

	    this.ValSizeInBits = function()
	    {
	        return this.numBits;
	    }
	}

	// LZW decompression algorithm. See http://en.wikipedia.org/wiki/LZW
	// Correctly handles the 'anomolous' case of 
	// character/string/character/string/character (with the same character 
	// for each character and string for each string).
	this.decompress = function(data, bitcount)
	{
	   if(bitcount == 0)
	       return "";

	   var dict = new this.DecompressDictionary();
	   var numBits = dict.ValSizeInBits();

	   var k = this.input.Read(numBits);
	   var output = String.fromCharCode(k);
	   var w = output;
	   var entry = "";

	   while ((k = this.input.Read(numBits)) != null)
	   {
	      if (k < dict.Size()) // is it in the dictionary?
	          entry = dict.LookupIndex(k); // Get corresponding string.
	      else 
	          entry = w + w.charAt(0);
	
	      output += entry;
	      numBits = dict.Insert(w + entry.charAt(0));
	      w = entry;
	   }
	
	   return output;
	};

} // end of LZWDecompressor


// Initialise the compressor
var output = new OutStream();
var compressor = new LZWCompressor(output);

var input = "TOBEORNOTTOBEORTOBEORNOT#";
compressor.compress(input);

// Initialise the decompressor
var instream = new InStream(output.bytestream, output.offset);
var decompressor = new LZWDecompressor(instream);

var decomp = decompressor.decompress();
WScript.StdOut.WriteLine(decomp);