Home
Admin | Edit

ARM2+ LZ4 decompression routine

Few days ago i ported to ARM2 a LZ4 decompression routine which was made for ARM Cortex-M0. It took me a few hours to get it working, it did not use much modern ARM instructions.

LZ4 is a lossless general data compression algorithm focused on compression / decompression speed. It belongs to the LZ77 compression schemes. This sort of compression was actually used on some old-school platforms like Atari ST or Apple II because it was fast and good.

The point was to compress my Acorn Archimedes binaries (because ARM code density isn't great) and associated data like bitmaps. The later idea is to have some kind of loader which take care of that dynamically, it must be fast since it will decode data at runtime.

At first i was going for the old LZW compression algorithm which is very simple to implement (compression isn't great though) but stumbled upon LZ4 later on which had great compression / speed ratio and is also relatively simple.

The decompression speed and simplicity was the most appealing to me, you don't want to make peoples wait on some logos when you do a demo that use a lot of optimizations like loops unrolling, code generation and various lookup data.

The code probably works on many old-school ARM CPU (ARM2, ARM250, ARM3 and maybe newer), it is completely independent as i made it rely on its own stack. It is also the most recent version. (the author could not modify its own post after some times and some peoples pushed fixes in the comments section of the ARM online board, those fixes were integrated below)

It does not parse any headers so you may have to remove the compression header if you compress your data with some LZ4 tools. You must also append a 16-bit value on top of the compressed data, this value is the length of the compressed data. Since it is 16-bit the compressed data must fit into 64KB.

The original author uploaded a simple Perl tool to compress (with the official compressor), remove the compression header and append the 16-bit length at the top. I modified it a bit for my needs since it produced too many files (like some C code) and i wanted the files to have Arculator hostfs style file-type/extension. The updated Perl script can be found below after the decompression routine. The original filename of that tool was lz4cut, just save it somewhere, do a chmod +x lz4cut and compress your files with lz4cut your_filename the program will produce two files, the compressed one with headers (end with lz4) and the one compatible with the decompression routine below (start with lz4). You must have the lz4 binary somewhere in your system. (you can compile it from sources)

The performances are good (my data is loaded in few seconds) although the code could probably be much more optimized. As for the size it compress my 629KB binary to 38KB so it is very efficient! For some bitmap data it compress 131KB into 25Kb which is also not bad. Here is some write-up about LZ4 on 8088/8086 CPU which can give some hints about what this algorithm is capable of.

Usage

ldr r0,lz4DataAddr
ldr r1,dstDataAddr
bl unlz4

ARM2+ LZ4 decompression routine

 dcd 0
 dcd 0
 dcd 0
 dcd 0
.unlz4_stack ; stack goes upward

; LZ4 decompression routine ported to ARM2 by grz-
; credits -> community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/lz4-decompression-routine-for-cortex-m0-and-later
; make sure to remove compressed data header if you use a compression tool
; make sure to append a 16-bit value at the top of your compressed file which represent the compressed data length
; compressed data should thus fit into 64kb
; commented lines is the original version
; r0 = address of the compressed data
; r1 = RAM address of the decompressed data
; r4,r5,r6 = preserved
.unlz4
 ;ldrh r2,[r0] ; get length of compressed data
 ldr r2,[r0]
 mov r2,r2, ROR #16
 mov r2,r2, LSR #16
 adds r0,r0,#2 ; advance source pointer

 ;push {r4-r6,lr} ;save r4, r5, r6 and return-address
.unlz4_len
 adr r13,unlz4_stack
 stmfd r13!,{r4-r6,r14}
 adds r5,r2,r0 ; point r5 to end of compressed data
.unlz4_getToken
 ldrb r6,[r0] ; get token
 adds r0,r0,#1 ; advance source pointer
 ;lsrs r4,r6,#4 ; get literal length, keep token in r6
 movs r4,r6, LSR #4
 beq unlz4_getOffset ; jump forward if there are no literals
 bl unlz4_getLength ; get length of literals
 movs r2,r0 ; point r2 to literals
 bl unlz4_copyData ; copy literals (r2=src, r1=dst, r4=len)
 movs r0,r2 ; update source pointer
.unlz4_getOffset
 cmp r0,r5
 bge unlz4_bye
 ldrb r3,[r0,#0] ; get match offset's low byte
 subs r2,r1,r3 ; subtract from destination; this will become the match position
 ldrb r3,[r0,#1] ; get match offset's high byte
 ;lsls r3,r3,#8 ; shift to high byte
 movs r3,r3, LSL #8
 subs r2,r2,r3 ; subtract from match position
 adds r0,r0,#2 ; advance source pointer
 ;lsls r4,r6,#28 ; get rid of token's high 28 bits
 movs r4,r6, LSL #28
 ;lsrs r4,r4,#28 ; move the 4 low bits back where they were
 movs r4,r4, LSR #28
 bl unlz4_getLength ; get length of match data
 adds r4,r4,#4 ; minimum match length is 4 bytes
 bl unlz4_copyData ; copy match data (r2=src, r1=dst, r4=len)
 cmp r0,r5 ; check if we've reached the end of the compressed data
 blt unlz4_getToken ; if not, go get the next token
 ;pop {r4-r6,pc} ; restore r4, r5 and r6, then return
 ldmfd r13!,{r4-r6,r15}

.unlz4_getLength
 cmp r4,#15 ; if length is 15, then more length info follows
 bne unlz4_gotLength ; jump forward if we have the complete length
.unlz4_getLengthLoop
 ldrb r3,[r0] ; read another byte
 adds r0,r0,#1 ; advance source pointer
 adds r4,r4,r3 ; add byte to length
 cmp r3,#255 ; check if end reached
 beq unlz4_getLengthLoop ; if not, go round loop
.unlz4_gotLength
 ;bx lr
 mov r15,r14 ; return
.unlz4_copyData
 rsbs r4,r4,#0 ; index = -length
 subs r2,r2,r4 ; point to end of source
 subs r1,r1,r4 ; point to end of destination
.unlz4_copyDataLoop
 ldrb r3,[r2,r4] ; read byte from source_end[-index]
 strb r3,[r1,r4] ; store byte in destination_end[-index]
 adds r4,r4,#1 ; increment index
 bne unlz4_copyDataLoop ; keep going until index wraps to 0
 ;bx lr ; return
 mov r15,r14
.unlz4_bye
 ldmfd r13!,{r4-r6,pc}

Perl script (lz4cut)

#!/usr/bin/perl
use strict;

my $infile=@ARGV[0] or die("no input file.");

my ($file, $name)=removeExtension("$infile");
my $lz4file="$file.lz4";
my $binfile="$file.bin";
my $cfile="$file.c";
($name) = "$name" =~ /^([^\.]+)/;
my $array = "s" . uc(substr("$name", 0, 1)) . substr("$name", 1);

my $e = system("lz4", "-9", "$infile", "$lz4file");
if(-1 == $e){ die("failed compressing $infile to $lz4file."); }

my $bin;
if(open(LZ4FILE, "<", "$lz4file"))
{
    my $b;
    if(read(LZ4FILE, $b, 11))
    {
        my ($i, $f, $j, $l) = unpack("H8CA2V", $b);
        if("$i" eq "04224d18")
        {
            seek(LZ4FILE, 1, $f & 8);
            my $cl = read(LZ4FILE, $bin, $l);
            if($cl == $l)
            {
                print("Extracted $l bytes from $lz4file\n");
                print("prepending 16-bit length.\n");
                $bin = pack("v", $cl) . "$bin";
            }
            else
            {
                die("$lz4file is garbled.");
            }
        }
    }
    close(LZ4FILE);
}

if(length($bin))
{
    if(open(BINFILE, ">", "$binfile"))
    {
        print(BINFILE "$bin");
        close(BINFILE);
    }
    if(open(CFILE, ">", "$cfile"))
    {
        print(CFILE "static const uint8_t ${array}[] = {\n");
        my $data;
        while("$bin")
        {
            $data = substr($bin, 0, 16);
            $bin = substr("$bin", length($data));
            $data =~ s/(.)/sprintf("0x%02x, ", ord($1))/seg;
            print(CFILE "\t$data\n");
        }
        print(CFILE "};\n");
        close(CFILE);
    }
}

sub removeExtension(pathname)
{
    my @path = split(/\//, "$infile");
    my $fname = pop(@path);
    my @fname = split(/\./, "$fname");
    scalar(@fname) > 1 && pop(@fname);
    push(@path, join(".", @fname));
    (join("/", @path), "$fname");
}

back to topLicence Creative Commons