Tumble Forth

Enough with the fossil fuel combustion metaphors. That little Forth has already begun crawling and standing, it’s now time to add that crucial ingredient that will allow it to walk by itself, that is, creating new words from its base set, unlocking a whole world of possibilities.

Doing so requires us to perform several little tasks, which we’ll do in this article:

Enter Here

If we want to add new words to the dictionary, they have to go somewhere. In Forth, there’s a variable called “here” which is a pointer to the system’s permanent¹ heap. That heap is where all the permanent results generated at runtime goes, where it can be referenced by further processing, which can be written to “here”, and so forth, until the moon is reached.

The principle is simple: the heap starts at some address in memory and whenever we write something to the heap, we increase the pointer accordingly. The code writing this new data to the heap has the responsibility of referencing² this data properly so that it can be reused where it must.

In Forth speak, adding something to that heap is called “write” and the associated symbol for its action is “,” (the comma).

If you look at the implementation I’ve written for you, you’ll see that I’ve added a new here variable which is initialized with the value of the herestart label, which is simply placed at the very end of the code³.

Then, I’ve added three write routines, writechar, writeword and writestr⁴ which operate like I’ve described above. Those routines are used by the word compilation routines we're covering below.

Code-wise, there’s not much new. There’s another of our fancy string instructions, movsb, which is used to copy code from one area to the other. This time, instead of repz, we use rep which repeats unconditionally CX times.

Dictionary indirection

The dictfind routine so far has been beginning its search from a hardcoded label called dictionary. It worked well, but it didn’t allow us to add new words at runtime. To this end, we need to add a new variable, which will be the new starting point for dictfind.

Adding a word then becomes as simple as writing a new entry in “here” and update the dictionary variable so it points to it.

Code-wise, this operation is trivial. I kept the dictionary name for the variable, so I renamed the label for the last entry to lastentry, which is the initial value for the new variable. Then, all we need to do is to, in dictfind, put dictionary between [] brackets to add an indirection.

Words, threading and execution models

In Forth speak, creating a new word is “: wordname ... ;”. For example, if we wanted to simply combine hello and goodbye in a single word, we would do:

After that definition is made, calling wellthatwasquick calls hello, then goodbye, then returns.

How that word is compiled varies from Forth to Forth. Brad Rodriguez makes an excellent presentation of the different execution models encountered in the wild. Before implementing word compilation in a Forth, one needs to choose a model. The model I chose for our own little Forth is the Subroutine Threaded Code (STC) model.

Brad mentions that speed might be an advantage to STC and it’s true, but he fails to mention the awesomest advantages of all: with STC, Forth code and native code can freely intertwine, which is not the case with ITC and DTC. In the C compiler implementation I’m going to show you in this story arc, this is a crucial capability.

It’s not just the C compiler. So many doors open with STC! Dusk OS, also a STC Forth, takes advantage of this intertwinability to crazy cool levels. I can’t show them to you now because we have a baby Forth to finish, but we’ll see some of this craziness soon enough. In my opinion, when compactness isn’t a primary concern⁵, STC is king.

Compiling a word

If you look at the “:” word I’ve implemented for you (at the bottom of kernel.asm), you’ll see that the code there follows this logic above using semantics you should already understand. The juicy parts are in the createentry, writecall and .stop routines, so let’s look at them.

createentry writes the name of the entry, followed by the link to the previous entry, followed by the length of the name, as per our dictionary structure. The code should look straightforward to you. The only funny part is the weird exchange we make between [here] and [dictionary]. At that very moment, [here] points to the very place where [dictionary] will have to point to when the entry is finished being written, but at the same time, [dictionary] holds the value we want to write at [here]. A little mind bending, but think about it for a while, it will end up making sense.

You remember when I told you in my previous article that we use the hardware stack to preserve registers? That’s what we do with the name length in CX. The funny part is when we pop it: we pop it in another register. We can do that? Of course we can! But as regular developers, our brain is not used to thinking of these kinds of little tricks. It’s a pleasant habit to acquire.

Then comes writecall, where the magic happens for real. But even though this operation opens the door to an infinite and magical world, the actual operation is rather mundane: we simply encode a 16-bit i386 call, that is, the 0xe8 opcode followed by a displacement leading to the target to call. Yes, displacement. Calls under i386 don’t target absolute addresses, but relative ones. The displacement is relative to the end of the call operation. For example, the call E8 00 00 is a noop. The call E8 FD FF (-3) is an infinite loop until it smashes the stack.

By the way, if you want to play with the concept, I encourage you to build yourself a test assembler file with fake call instructions and then look at the encoded result, or better yet, use ndisasm to examine the disassembled binary. Doing this helps understanding i386 instruction encoding logic a lot.

So, back to that magical part: all we need to do is to transform our absolute target address to a displacement by subtracting “here+3” from it, because “here” is where we’re about to write that call. If you look at the code, you’ll see that this is what we do.

The easiest part is .stop. All we need to do at that point is to encode a ret and that ret has a simple opcode, 0xc3. We’re done!

A small step for Forth…

And just like that, our little baby can walk by itself, creating new words from its base set. The problem now is to design a base set of words that is sufficient to take us to the moon, something we’ll tackle in the next article, but we can have a taster through the addition of two trivial words: “*”, a simple multiplication operation and “dup”, a word that duplicates the top element of the stack. I wrote them already. With this, we can easily complete the exercise that I asked in the previous article:

Exercises