Tumble Forth: Baby's first steps

Enough with the fossil fuel combustion metaphors. That little Forth has already begun crawling and standing, it’s now time to add that crucial ingredient that will allow it to walk by itself, that is, creating new words from its base set, unlocking a whole world of possibilities.

Doing so requires us to perform several little tasks, which we’ll do in this article:

  1. Introduce the Forth concept called “here”.
  2. Make the dictionary label mutable at runtime.
  3. Implement the word “:”.

Enter Here

So far, we’ve been dealing with three memory areas:

  1. Core routines
  2. Variables
  3. Initial dictionary

If we want to add new words to the dictionary, they have to go somewhere. In Forth, there’s a variable called “here” which is a pointer to the system’s permanent1 heap. That heap is where all the permanent results generated at runtime goes, where it can be referenced by further processing, which can be written to “here”, and so forth, until the moon is reached.

The principle is simple: the heap starts at some address in memory and whenever we write something to the heap, we increase the pointer accordingly. The code writing this new data to the heap has the responsibility of referencing2 this data properly so that it can be reused where it must.

In Forth speak, adding something to that heap is called “write” and the associated symbol for its action is “,” (the comma).

If you look at the implementation I’ve written for you, you’ll see that I’ve added a new here variable which is initialized with the value of the herestart label, which is simply placed at the very end of the code3.

Then, I’ve added three write routines, writechar, writeword and writestr4 which operate like I’ve described above. Those routines are used by the word compilation routines we're covering below.

Code-wise, there’s not much new. There’s another of our fancy string instructions, movsb, which is used to copy code from one area to the other. This time, instead of repz, we use rep which repeats unconditionally CX times.

Dictionary indirection

The dictfind routine so far has been beginning its search from a hardcoded label called dictionary. It worked well, but it didn’t allow us to add new words at runtime. To this end, we need to add a new variable, which will be the new starting point for dictfind.

Adding a word then becomes as simple as writing a new entry in “here” and update the dictionary variable so it points to it.

Code-wise, this operation is trivial. I kept the dictionary name for the variable, so I renamed the label for the last entry to lastentry, which is the initial value for the new variable. Then, all we need to do is to, in dictfind, put dictionary between [] brackets to add an indirection.

Words, threading and execution models

In Forth speak, creating a new word is “: wordname ... ;”. For example, if we wanted to simply combine hello and goodbye in a single word, we would do:

: wellthatwasquick hello goodbye ;

After that definition is made, calling wellthatwasquick calls hello, then goodbye, then returns.

How that word is compiled varies from Forth to Forth. Brad Rodriguez makes an excellent presentation of the different execution models encountered in the wild. Before implementing word compilation in a Forth, one needs to choose a model. The model I chose for our own little Forth is the Subroutine Threaded Code (STC) model.

Brad mentions that speed might be an advantage to STC and it’s true, but he fails to mention the awesomest advantages of all: with STC, Forth code and native code can freely intertwine, which is not the case with ITC and DTC. In the C compiler implementation I’m going to show you in this story arc, this is a crucial capability.

It’s not just the C compiler. So many doors open with STC! Dusk OS, also a STC Forth, takes advantage of this intertwinability to crazy cool levels. I can’t show them to you now because we have a baby Forth to finish, but we’ll see some of this craziness soon enough. In my opinion, when compactness isn’t a primary concern5, STC is king.

Compiling a word

So, we want to compile a word at runtime using the STC model. How to proceed?

  1. Read a name from input.
  2. Write a new dictionary entry with that name.
  3. Read a name from input.
  4. Is it “;”? If yes, compile a ret i386 opcode and stop.
  5. Find corresponding word in dictionary.
  6. If not found, abort with “word not found”6.
  7. Compile a i386 call instruction targeting that word.
  8. Goto 3.

If you look at the “:” word I’ve implemented for you (at the bottom of kernel.asm), you’ll see that the code there follows this logic above using semantics you should already understand. The juicy parts are in the createentry, writecall and .stop routines, so let’s look at them.

createentry writes the name of the entry, followed by the link to the previous entry, followed by the length of the name, as per our dictionary structure. The code should look straightforward to you. The only funny part is the weird exchange we make between [here] and [dictionary]. At that very moment, [here] points to the very place where [dictionary] will have to point to when the entry is finished being written, but at the same time, [dictionary] holds the value we want to write at [here]. A little mind bending, but think about it for a while, it will end up making sense.

You remember when I told you in my previous article that we use the hardware stack to preserve registers? That’s what we do with the name length in CX. The funny part is when we pop it: we pop it in another register. We can do that? Of course we can! But as regular developers, our brain is not used to thinking of these kinds of little tricks. It’s a pleasant habit to acquire.

Then comes writecall, where the magic happens for real. But even though this operation opens the door to an infinite and magical world, the actual operation is rather mundane: we simply encode a 16-bit i386 call, that is, the 0xe8 opcode followed by a displacement leading to the target to call. Yes, displacement. Calls under i386 don’t target absolute addresses, but relative ones. The displacement is relative to the end of the call operation. For example, the call E8 00 00 is a noop. The call E8 FD FF (-3) is an infinite loop until it smashes the stack.

By the way, if you want to play with the concept, I encourage you to build yourself a test assembler file with fake call instructions and then look at the encoded result, or better yet, use ndisasm to examine the disassembled binary. Doing this helps understanding i386 instruction encoding logic a lot.

So, back to that magical part: all we need to do is to transform our absolute target address to a displacement by subtracting “here+3” from it, because “here” is where we’re about to write that call. If you look at the code, you’ll see that this is what we do.

The easiest part is .stop. All we need to do at that point is to encode a ret and that ret has a simple opcode, 0xc3. We’re done!

A small step for Forth…

And just like that, our little baby can walk by itself, creating new words from its base set. The problem now is to design a base set of words that is sufficient to take us to the moon, something we’ll tackle in the next article, but we can have a taster through the addition of two trivial words: “*”, a simple multiplication operation and “dup”, a word that duplicates the top element of the stack. I wrote them already. With this, we can easily complete the exercise that I asked in the previous article:

: square dup * ;
5 square .


  1. What happens when you try to add a number in a word definition?
  2. Do you have an idea how you could add support for number literals in definitions?
  3. Forth has a word called over which fetches the second item from PS and pushes it, resulting in a “a b -- a b a” signature. How would you go about implementing it?

Next: The Unbearable Immediateness of Compiling

  1. It’s in fact semi-permanent because there’s also the concept of “forget” which allows to reclaim some of that heap, but that’s an exception to its regular modus operandi

  2. This is often synonymous to “adding it to the dictionary”, but not always. 

  3. By the way, have you noticed that this whole code results in a 742 bytes binary? How big is a GCC hello world again? 

  4. “writecall” is special, I talk about it below. 

  5. If we have more than, say, one megabyte of memory. 

  6. It should be noted that aborting at that point leaves the word unfinished, and thus broken. With no ret to stop the call chain, calling the broken word will result in executing uninitialized memory, which means “fireworks!”. Our baby Forth isn’t mistake-friendly!