Writing a primitive Forth in JavaScript

Introduction
Proto Forth breakdown
Postfix notation
Conclusion

Introduction

What is Forth ?

Forth is a typeless highly reflective stack-oriented concatenative programming language created by Charles H. Moore often used in embedded systems due to heavy minimalism and flexibility.

This language features a minimalist syntax with identifiers (called "words") separated by spaces.

It use Postfix notation by default and features some base words (primitives) that the user compose to build programs.

Forth has been historically used by calculators, video games (also as a scripting language), some 8-bit computers and in many embedded systems. (space industry, firmware system etc.)

Motivation

This article is mostly an introduction to a set of articles about building a baremetal Forth based operating system. All of this is loosely related to my Moving to lower tech article and also influenced by code golf programs.

Initial motivation was to experiment with a straightforward Forth implementation as an introduction to programming languages.

This prototype was later ported to ARM (RPI 0) based on all the acquired knowledge with minimalism still a goal.

The ARM implementation was then expanded to a slightly more advanced ARM Forth dialect / compiler which evolved to a personal baremetal computing environment focused on a blend of minimalism and usability.

a full example program of my ARM Forth dialect (~70 primitives not counting syntactic sugar)

Resources

A helpful starting resource to understand Forth was the write-up of jsforth by jborza but most importantly Starting Forth and Thinking Forth by Leo Brodie followed by JONESFORTH low level x86 implementation along with Gforth to experiment with the language,

Moving Forth series was later useful on details for a low level implementation. (most useful to me was the threading technique comparison which made me chose a Subroutine Threaded Code technique)

Some other resources :

As for videos, i quite like the Silicon Valley Forth Interest Group YouTube channel.

Prototype

This prototype is a primitive Forth interpreter implemented in JavaScript, it is not fancy and try to use straightforward bits of JavaScript, it make use of the p5js framework and has a graphical CLI emulation instead of simple text output.

The prototype is not compliant to the standard. It can be tried here.

the Forth proto, ':' define a new word, ';' close the definition, a word is a composition of other words

It support these words :

: switch to compile mode
; switch to interpreter mode
+ add two numbers
- subtract two numbers
. show stack top content
words show defined words

It try to parse unknown words as numbers, they are pushed to the data stack if successful.

Practicality note :

The prototype is missing a few primitives that are required for a practical Forth implementation (stack manipulation words etc.) for the sake of conciseness but Forth concept is fully there, primitives can be trivially defined later on like the existing ones to complete the language by following the standard or making them up to build a personal dialect / match a specific problem domain.

My most advanced Forth dialect (see screenshot above) which is compile only has ~70 primitives of which ~15 are I/O related and ~10 are only there for extra speed, it does not contain '.' nor 'words' primitive compared to this prototype.

Also note that there is incredibly small implementation of Forth such as sectorforth or milliForth, they show the highly reflective features of Forth by bootstrapping a fully featured Forth out of a minimal set of trivial primitives (~10 implemented in assembly) in less than 512 bytes.

To go further down the rabbit hole there is also a 3 instruction Forth.

Proto Forth breakdown

Here is a breakdown of the Forth prototype without CLI emulation.

The code roughly match a low-level implementation which is why i use an array of characters for the words (i call them tokens here) instead of a string type, implementation may be simpler with higher level constructs / data types of modern programming languages.

Data structures

A Forth implementation in a high level language only needs two data structures :

a data stack to pass data between words, my prototype has a single data type : numbers
a words dictionary which is where predefined words (also called primitives) go into along with user defined words

Dictionary words contain a list of functions and a boolean to indicate how the word must be interpreted.

A low level implementation additionally add a return stack and the dictionary is usually implemented as a linked list with room for word flags / word length plus code body, an optimized version might prefer a hash map for the dictionary to speed up words lookup.

let fmode = 0  // forth current mode;
0 for interpreter; 1 for "compiler"

let stack = [] // forth stack

let curin = [] // currently processed input

let words = [  // words list with predefined primitives
words

  // t: token, f: token code as a list of functions, i:
immediate ?

  { t: [':'], // switch to definition mode (compile mode)

    f: [(s) => {

      fmode = 1



      // consume next token and add it to
words list

      let nextToken = ntoken(curin)

      if (nextToken.length) {

        words.push({

          t:
nextToken,

          f: [],

          i: 0

        })

      }

    }],

    i: 0

  },

  { t: [';'], // end definition mode (switch back to
interpret)

    f: [(s) => {

      fmode = 0

    }],

    i: 1

  },

  { t: ['+'], // plus

    f: [(s) => {

      if (s.length < 2) {

        return 1

      }

      

      let a = s.pop()

      let b = s.pop()

      s.push(a + b)

    }],

    i: 0

  },

  { t: ['-'], // minus

    f: [(s) => {

      if (s.length < 2) {

        return 1

      }



      let b = s.pop()

      let a = s.pop()

      s.push(b - a)

    }],

    i: 0

  },

  { t: ['.'], // print the stack top content

    f: [(s) => {

      if (!s.length) {

        return 1

      }



      addToLineBuffer(s[s.length - 1] +
'')

    }],

    i: 0

  },

  { // print all defined words

    t: ['w', 'o', 'r', 'd', 's'],

    f: [(s) => {

      let lwords = []

      for (let i = 0; i < words.length;
i += 1) {

       
lwords.push(words[i].t.join(''))

      }

      

      addToLineBuffer(lwords.join(' ') +
'')

    }],

    i: 0

  }

]

Note how the compile mode is defined as a primitive, ':' start a new word definition and directly parse the next word then add it to the words list, ';' switch back to interpreter mode.

Word search

A Forth simply parse input words and look for them into the dictionary so we need a way to search words :

// find token (word) in
dictionary

function ftoken(token) {

  for (let w = words.length - 1; w >= 0; w -= 1) {

    let word = words[w]

    if (token.length === word.t.length) {

      for (let t = 0; t < token.length;
t += 1) {

        if (token[t] !==
word.t[t]) {

          break

        } else if (t ===
token.length - 1) {

          return
word

        }

      }

    }

  }

 

  return null

}

This searches the word backward to ensure that the most recently defined word takes precedence, allowing newer definitions to override older ones.

It is not mandatory to search backward but it helps supporting dynamic and iterative development, it also has a lookup speed advantage for the recently defined words.

Word call

An utility function to sequentially call all the functions associated to a word, this is necessary to run user defined words (which are a composition of words), each of these functions may return an error code that is also handled to provide some feedback about the execution state :

// execute word code

function wcall(word) {

  for (let i = 0; i < word.f.length; i += 1) {

    let r = word.f[i](stack)

    if (r === 1) {

      addToLineBuffer('Stack
underflow')



      return 0

    }

  }

 

  return 1

}

Word parse

This consume an input string and collect characters until a white-space is detected (a "word" in Forth is any characters delimited by a white-space), collected characters are then returned :

// consume input string until a
white-space is reached then return consumed / collected
characters

function ntoken(t) {

  let token = []

  let c = t.shift()

  while (c && c !== ' ') {

    token.push(c)

    

    c = t.shift()

  }



  return token

}

Interpreter (entry point)

The interpret function is the entry point of the interpreter / compiler, it is called with the input string after the RETURN key is hit, it call the tevaluate() function for each words parsed by the ntoken() function :

function levaluate(s) {

  curin = s



  let token

  while ((token = ntoken(curin)).length) {

    if (!tevaluate(token)) {

      return

    }

  }



  // should output "ok" in a standard Forth at this point as a
simple feedback mechanism

}

Word evaluation (forth core logic)

Forth core logic in a nutshell shown as pseudo code (taken from a Reddit post)

The evaluation code is straightforward but the "compile word" step may be confusing at first although it is incredibly simple : in compile mode ':' all evaluated words are added to the functions list of the newly user created word (the latest word of the dictionary) unless the immediate flag is set, this generate (compile) the new word body code which is just a composite of other words functions. This basically allow to extend the dictionary with new words.

The same idea is applied when a number is found in compile mode except that it generate the call of a function that push the number since it is not really practical to have a definition for each numbers.

In compile mode the immediate flag of a word execute the word immediately when set, it act as a parameterized macro system and is useful to implement ';' for example to switch the mode back to interpreter mode, it act as a way to alter the behavior of the Forth during the compile stage and is useful to implement many language constructs such as conditionals etc. it can act similarly than the C preprocessor although it is way more powerful. (it is a part of the reflective nature of Forth)

// token ('word') evaluator

function tevaluate(token) {

  let word = ftoken(token)



  if (word) { // process word

    if (!fmode || (fmode && word.i)) { // call
word code when forth mode is either interpreter or compiler /
immediate mode

      return wcall(word)

    } else { // compile mode (add word code to the
most recently defined word)

      let tail = words.length - 1

      words[tail].f =
words[tail].f.concat(word.f)

      

      return 1

    }

  } else { // unknown word so try to parse it as a number

    let i = parseInt(token.join(''), 10)

    if (isNaN(i)) { // not a number so output a
feedback message, rollback if in compile mode

      if (fmode) {

        fmode = 0

        

        words.pop()

      }

      

      addToLineBuffer(' Undefined word ' +
token.join(''))

      

      return 0

    } else { // a valid number ?

      if (fmode) {

        // append function which
wrap stack push

        let tail = words.length
- 1

        words[tail].f =
words[tail].f.concat([() => { stack.push(i) }])

      } else {

        // push regular
number

        stack.push(i)

      }

      

      return 1

    }

  }

}

And voilà ! All of this implement the core concept of Forth, it lacks few primitives such as I/O and stack operations such as drop, dup etc. to be really useful as it looks like a fancy calculator right now but they are mostly trivial to add just like it is trivial to extend the language by adding words to the list, a complete Forth implementation has words for loop constructs, conditionals etc. all the elements of a programming languages can be defined in the same way.

Postfix notation

Postfix notation may looks rough but i tend to think that it is mostly a question of habituation, Infix is perhaps easier to read sometimes but involve slightly more complicated parsing.

A way to makes Postfix notation more readable is by allowing parentheses to group elements, they would be mere syntactic decoration compared to Infix but might increase the readability in some situations. The other way involve an Infix to Postfix conversion step using an algorithm such as Shunting Yard algorithm. (which was later generalized to operator-precedence parsing)

Conclusion

This prototype was quite eye opening to me as i realized the brilliance of Forth design / practicality while writing it, this interpreter is a kind of virtual machine embodying minimal concepts of a programming language with reflection being a core feature, the prototype was akin to an extensible "OS" shell / programming language combo in ~100 lines of code... and the implementation journey was incredibly easy and lovely with straightforward algorithms, it doesn't require an Abstract Syntax Tree due to its syntactic minimalism for example, it end up being a chosen series of cohesive and compact instructions that one can flex for any use cases.

Some improvements for the prototype others than adding primitives might be to bootstrap the REPL in Forth with the reflective features (this only require a few primitives) or generate low level code (or an intermediate representation) for the primitives so that it is more akin to a compiler.

Lisp might be worth checking out as it is a slightly higher level language than Forth but still stay compact and simple.