Home
Admin | Edit

Writing an ARMv2 assembler in Forth

Introduction


See sources.

This article is about a Gforth implementation of a simple ARMv2 macro assembler. It is loosely related to a series of articles about an ARM Forth dialect. (part 1 / part 2)

My idea for this assembler was to approach the conventional ARM syntax in a Forth way (all words) that directly assemble into the current word, side idea was to make it as simple as possible so it can be compiled by a very light Forth (e.g. ARM-ForthLite) with few primitives.

Generating opcodes into words is useful as a way to extend low level Forth primitives, they can be comfortably defined with the assembler in some sort of inline mode. (note : this feature apply less to "high level" multi platform Forth such as Gforth though)

This assembler was sketched out while i was making my ARM based Forth as some sort of side exploration, i later took the opportunity to redo it from scratch as i felt it would be a fun project that may be useful for Acorn computers stuff / my next ARM based Forth.

Note : Gforth is a standardized Forth that works well on Linux, it has many features, the assembler code can be pasted in the Gforth command line interpreter to test it.

Why ARMv2 ?

Mainly due to my fondness for this subset (RISC, elegance with pragmatic quirkiness etc.) + associated interest of early Acorn computers, ARMv2 is a subset of late 80s up to now 32 bits ARM (ARMv2 up to ARMv7), it works mostly unchanged along 30+ years of history and is still widely used although slowly replaced by 64 bits ARM. The instruction set is very small.

Resources

Some of the resources i used to build this :

Sample code / Syntax comparison

Here is a syntax comparison between the Forth ARMv2 assembler and a conventional assembler (BBC BASIC inline assembly) for the same program :

\ a bunch of predefined OS constants (BBC BASIC bundle these constants)
: OS_WriteI $100 ; immediate
: OS_ReadMonotonicTime $42 ; immediate
: OS_ReadEscapeState $2c ; immediate
: OS_Exit $11 ; immediate

variable archismall_loop

create ARCHISMALL_ARMv2_CODE
    OS_WriteI $16 + swi            \ swi OS_WriteI+22
    OS_WriteI $d + swi             \ swi OS_WriteI+13
    [] $2c imm r15 r9 ldr          \ ldr r9,[r15,#44]
    $140 imm r4 mov                \ mov r4,#320
    archismall_loop !LABEL         \ .archismall_loop
        OS_ReadMonotonicTime swi   \ swi OS_ReadMonotonicTime
        $1 asr r3 r2 r2 add        \ add r2,r2,r3,asr #1
        $1 asr r2 r3 r3 sub        \ sub r3,r3,r2,asr #1
        $13 lsl r0 r2 r2 sub       \ sub r2,r2,r0,lsl #19

        $18 lsr r3 r6 mov          \ mov r6,r3,lsr #24
        r9 r4 r6 r7 mla            \ mla r7,r6,r4,r9

        $4 lsr r0 r6 mov           \ mov r6,r0,lsr #4
        [] $18 lsr r2 r7 r6 strb   \ strb r6,[r7,r2,lsr #24]

        OS_ReadEscapeState swi     \ swi OS_ReadEscapeState
        archismall_loop @LABEL bcc \ bcc archismall_loop
    OS_Exit swi                    \ swi OS_Exit
\ .screenAddr
    $1fec020 l,                    \ dcd &1fec020
here constant ARCHISMALL_ARMv2_CODE_END

The assembly code is directly inserted (inlining) into ARCHISMALL_ARMv2_CODE word, the last line mark the end of the assembly code so the content can be retrieved later on through the ARCHISMALL_ARMv2_CODE_END constant.

The ARM code looks close to the conventional syntax albeit with some differences :
Some instructions may looks verbose as there is no optional arguments (e.g. can't do strb r0,[r1]), can be fixed by detecting [] or looking at stack depth.

Throw away unnamed labels could be made easily by storing here on stack.

Raw Opcode output

Here is a way to output the raw opcodes to the console :

: ARM32_OPCODE
    hex 0 ?do dup i + l@ 8 u.r cr 4 +loop drop decimal ;
ARCHISMALL_ARMv2_CODE ARCHISMALL_ARMv2_CODE_END ARCHISMALL_ARMv2_CODE - cr ARM32_OPCODE \ ARM32_OPCODE can be replaced by "dump" word as well

Implementation


The implementation map ARMv2 mnemonic to a Forth word (a mnemonic = a word) so it looks quite close to the conventional syntax, some of the mnemonics definition only differ by a word for mnemonics that just flip some bits. (e.g. andeq vs andeqs)

90% of the assembler code is code duplication and merging bits together with OR !

Words

Here are the Forth words used by the implementation :
  • stack : swap over rot dup drop exit
  • flow controls : do loop unloop if else then i
  • arithmetic : lshift rshift * - + invert negate
  • memory access : allot here l@ l!
  • logic : or 0= 0> u<
  • misc : variable (useful for labels; imply a create word unless simpler / builtin scheme is used)
Most of them are trivial to implement as they nearly map 1:1 to corresponding CPU code, most of them can be directly defined in Forth as well.

Loop support is optional (may remove 3 words) although useful for a macro assembler (loop unrolling etc.), there is only two small loops for immediate encoding and block data transfer instruction and they can be unrolled without compromises.

Words such as here or allot are also optional, they are useful to emit opcodes into definitions but opcodes could be emitted elsewhere as well. (buffer etc.)

Some refactoring may help to further reduce the amount of words used by the implementation.

Mnemonics

Mnemonics definition are very short and looks like this :

: mov ARM2_DPI ARM2_AL ARM2_MOV l, ; immediate

They all end up with l, which emit the computed opcode into the current definition then advance by 4 bytes :

: l, here l! $4 allot ;

Most "heavy" computation of the assembler happen in the encoding of data transfer instructions through ARM2_SDT (especially) and ARM2_MDT words (also immediate value encoding), single data transfer instructions have many options / different arguments length to support either an immediate operand or a register which can also be shifted.

Here is four type of single data transfer instruction :

[!] $18 imm r7 r6 strb   \ strb r6,[r7,#24]!
[] $18 lsr r2 r7 r6 strb \ strb r6,[r7,r2,lsr #24]
$18 imm [] r7 r6 strb    \ strb r6,[r7],#24
$18 lsr [] r2 r7 r6 strb \ strb r6,[r7],r2,lsr #24

The block data transfer word consume a list of registers as operand :

{!} r4 r3 r2 r1 r0 stmia \ equivalent to : stmia r0!,{r1-r4}

Here ARM2_MDT word (that is used by stmia) consume stack values until 16 values are consumed or a value isn't in the [0,15] range. (this is the case of what is left on stack by {} which act as an exit condition)

Quirk

Some implementation quirks as my goal was to keep it pragmatic / straightforward :

Conclusion


This approach probably dates back to early Forth assemblers (Gforth ARMv4 assembler looks similar) and demonstrate how useful Forth can be as a way to quickly get a nice and powerful macro assembler working with minimal amount of effort from a tiny Forth core with near zero amount of parsing, the assembler can then be used to extend the lower level Forth primitives in a comfy way.

Also quite cool that the assembler syntax doesn't depart much from conventional syntax, it still looks readable (in a RPN way) although a bit verbose. (more optional arguments may be a quick way to fix this)

It also show the elegance of the ARMv2 subset, a simple but working subset for most tasks with powerful bits !

Only gotcha is that i couldn't find a way to make labels looks like conventional labels in Gforth (shouldn't require variable definition) as the label name will get compiled into a new word and thus break the inlining, it is certainly doable (could use a buffer instead of putting opcodes into a definition) but it departed too much from the simplicity goal, may be easier with a custom Forth by hacking it such as allowing multiple dictionary space.

back to topLicence Creative Commons