Tumble Forth

We have a bare bone I/O loop that sure is simple, but that really doesn’t do much. What about triggering a first explosion of possibilities by combining it with another concept? This time, it’s going to be the Dictionary.

The Dictionary

In Forth, the word “dictionary” refers to a linked list going backwards where each element is executable and has a name. By “going backwards”, I mean that each new element is added on top of the previously added element and that iteration is done starting from the last element. By “executable”, I mean that the payload is meant to be executed, the meaning of which depending on our execution model¹.

The structure of the dictionary differs from Forth to Forth, but here’s the one we’ll use:

“A” is the address of the dictionary entry. Such a structure has the following properties which we’ll rely on:

A regular Forth has only one dictionary, so when we say “the dictionary”, we mean the One.

In Forth nomenclature, the combination of an entry in the dictionary and its payload is called a word, and “word” points specifically to the “payload” part. For example, you’ll see later that there is a Forth word called ‘(single quote) which yields the address of a word. When we say that, we mean the address of the payload of the entry². In other words, a word is executable.

I know, I know, it’s not the best name ever because its meaning is ambiguous. There is no way around it though, all Forth literature use this nomenclature, so you might as well get used to it.

Writing a dictionary

All Forths start with a system dictionary which is often assembled in their predefined structure directly in the code. If you look at the implementation I’ve written for you, you’ll see that this is what I’ve done too, starting at the “; Dictionary” comment. Let’s look at the first entry (hellomsg isn’t part of the dictionary entry):

You can see that the first 3 lines correspond exactly to the structure described above. It’s the first time we see the dw directive. It does the same thing as db, but writes a little-endian word³ instead of a byte. hello is the name, which is of course 5 bytes in length. Because this is the first entry, the “link” field is zero. This way, we know that this is the end of the line when we iterate the dictionary.

Then, the payload follows immediately. It’s a bit weird to see code organized like this because we’re used to seeing code directly follow a label. If we were to directly call this piece of code from assembly, we would indeed add a label right after db 5, but this word is only ever called through the interpret loop, so we don’t bother.

The way NASM resolves dw hello is exactly the same as if it resolved

call
hello

, but instead of encoding a call, it simply writes down the resolved address as a literal. This is what we need in our dictionary structure.

Then, we continue to add a bunch of entries in the same fashion until we reach the last one, which we bind to the dictionary label, which we’ll need for the lookup routine.

I recommend looking at kernel.img in a hexadecimal editor, using literals like hello and goodbye as orientation marks. By subtracting 0x500 to each link field (because that executable lives at 0x500, remember?), you should be able to follow the chain by yourself. It’s a good exercise to visualize what is going on.

Looking up a word

The main addition to the code in this article is the dictfind routine which can be used to look up an entry by name in the dictionary. It takes a name/length input and returns either an entry address if a corresponding one was found, or 0 if nothing was found. The logic of the routine is as follow:

Assembler-wise, a few things to mention. We use the [register+offset] form for the first time. Not only can we indirectly address a register, but we can also do so with an offset applied to it. For example, if BP is 0x1234,

“mov
al, [bp+2]”

copies the byte at address 0x1236 in register AL. It’s very useful to access structured data in memory, which is what we do here.

Then, there’s the cmpsb instruction which is a nice little creature. If our task is to compare every character between two addresses in memory across X bytes, we could write a simple loop to do so. But this is a task that is done so often that Intel thought it wise to add a “shortcut” instruction for it, which executes faster. The code “repz cmpsb” is the rough equivalent of:

And you know what? This little friend there can go both ways (increase or decrease SI/DI at each step)! Thank you Intel for being so kind, I’m sure it will come handy some day, but most of the time, this feature is there to hit me in the face: the status of the flag⁴ that controls it doesn't have a specific state at initialization. To make sure that we go forward, we have to clear the D flag with cld, which we do at the beginning of the code, never to touch it again.

Other changes to the code

Other notable changes to the code that were made is to switch, in spitchar, from INT10H AH=0E to INT10H AH=09 so that we can have colors. This comes at the cost of having to maintain a cursor row/col position.

Along with that comes a newline routine. This routine assumes a 80x24 grid. It resets col to 0 and increase row by 1, wrapping around 24 using the div instruction in its 8-bit form.

div is worth a special mention. Along with mul, it has a peculiar behavior. First of all, it always has AX (or AL in 8-bit) as a destination. NASM requires us to not specify it at all, but it's there all right. It also doesn't have an immediate form, which means we need to load our divisor in a register. We picked CL.

However, we're not interested in the result of the division but its modulo! div supplies both. In its 8-bit form, the one we use in div cl, AL gets the division result and AH gets the modulo.

I also added a new strlen routine to compute the length of a null-terminated string. It also uses a fancy instruction, scasb, which operates a bit like cmpsb, but instead of comparing two memory areas, it compares one memory area to a constant, 0 in this instance (we’re looking for the zero). We take advantage of the CX decreasing logic so that we can reuse that value to extract a final length from it.

Finally, the main loop stitches it all together, transforming the resulting entry into a word (if found) so that it can call it, for a final result that is functionally much richer than our previous iteration: we have a pool of routines that we can arbitrarily call at runtime. Those routines operate around a richer shared state, as we can see with a fancier “Variables” section. With the color variable, we can change the behavior of our printing words. An explosion of possibilities!

Exercises

Up next

Interesting little explosion right? That’s only the beginning! In the next article, we’ll introduce the Parameter Stack along with number parsing routines, making our interpret loop complete. This will make our system operator⁵ much more powerful.