Tumble Forth: A tale of two stacks

Nice little pétarade we had last time right?A simple structure, a simple lookup routine, and suddenly, a new dimension opens to the system operator. As I’ve already said, this is just the beginning. This time, we add a Parameter Stack into the mix which will allow the parametrization of our words.

What’s a stack?

A stack is a data structure with a bottom at a fixed address upon which data is pushed. A pointer to the top of the stack is kept, allowing us to pop from the stack in the opposite order. This data structure has some interesting properties which we’ll rely on:

  1. Push and pop operations are very fast. This is important for stacks that are used in low level routines, because pushes and pops happen all the time.
  2. Only one pointer needs to be maintained. This is important on CPUs with a limited number of registers: a stack that wants to be fast needs to have its pointer kept in a register.
  3. The “last in first out” data order is a very good fit to the “call” execution model. A piece of code that only takes responsibility for its top part of the stack, ignoring the rest and keeping the stack balanced, can be called from any other piece of code behaving this way, and in turn can also call any piece of code behaving this way, all of this while maintaining stack integrity.

The Hardware Stack

We’ve already been using a stack from the very moment we’ve encoded a call instruction. Most CPUs have a register dedicated to a hardware stack, upon which the address of the caller is pushed, allowing the callee to return back to that address when finished. In the x86 family, that register is SP1.

That stack is often used by programmers for more than return address keeping. The most frequent use of it is for what we call “preserving registers”, that is, to keep a value stored in a register for later before we overwrite it with something else. We push, do our thing, then pop back when we’re finished. This is often needed when calling pieces of code which we don’t control, such as the BIOS. Or, you know, sometimes there’s just not enough registers to juggle our data without dropping some of it.

“Can we really do that, mix return addresses and data? Don’t we risk ending up jumping in the middle of nowhere?” might you ask. The answers are yes and yes! which is why stack balance is paramount. This means that whatever you push on the stack absolutely must be popped back in equal size before you return from the routine. There is absolutely no room for error in this area.

Stacks generally grow down

That stacks grow down often confuses the newcomer. You might have noticed already that I initialize SP to zero in the code so far. “Is it because he puts his stack at the beginning of the address space? Isn’t that space supposed to be taken?” you might have asked yourself. Good question, but no: I’m in fact placing the stack at the very end of the 64KB address space.

The push operation begins by decreasing SP by 2, and only then write its data there. This means that if we want our bottom element to live at address 0xfffe, SP needs to be initialized to zero.

This order of operation has an effect worth remembering: this means that except for when the stack is empty, SP always points to the element at the top of the stack. If the order of operation was inverted, SP would point to the space where the next item will go. This of course wouldn’t make much sense2, which is why hardware stacks operate this way.

Why grow downwards? Difficult to say. Stacks can grow in any direction, theoretically, but in practice, they very often grow down, probably by tradition34.

The Parameter stack

Forths have two stacks, the Return Stack5 (RS) and the Parameter Stack (PS). The Return Stack has the responsibility of keeping track of return addresses when we reach the end of a word and the Parameter Stack is for passing arguments from word to word.

Having this second stack frees us from the burden of having to keep RS balance intact when passing arguments to another word. This simplifies calling semantics a lot. To measure the gain, go look again at x86 C calling conventions, which you should understand a little better now, and see what kind of hoops it has to jump through to manage argument passing with only one stack. Pretty hairy stuff right?

In Forth, it’s much simpler. Instead of copying, pushing, freeing arguments around like a bulldozer, PS is simply shared among all words and those words simply “flow” through it, applying what we call their “stack effect” to it.

For example, let’s imagine a word “+” which would take the top two elements of PS, add them, then push the result. We would then say that “+” has a stack effect of “2 elements in, 1 element out”. In Forth, there is a special way of expressing this stack effect that looks a little bit like a function signature: “a b -- n”, which means “arguments a and b come in, result n come out”. The documentation for a word will often reference those names. For example, for “+”, it could be “Add a and b together and return as n”.

Registers are getting tight

If you look at the Parameter stack I’ve implemented for you (see 01-duskcc/06-taletwostacks), you’ll see that I assigned PS to the BP register, initializing it to 0xf8006. This means that across the whole code, this register’s value must be preserved at all times. So far, we had enough registers to go around merrily, without having to worry about register preservation. All we had to do was to be careful not to step on another routine’s toes. These times are over.

As you can see, I’ve replaced BP usage across the whole kernel, and in some places, I’ve had to add push/pop pairs. You will also notice a new kind of comments associated with some routines: preserves register XX.

It is common, when writing assembler code, to document which registers are preserved7 so that one knows, before calling a routine, which of the registers they need to preserve themselves. This is also very useful when we go back to work in that routine. The "preserve" comment indicates "hey! callers are depending on this, don't touch this register!". Generally, when there are no comment, you assume the worst, that all registers are destroyed8.

Parsing numbers

So, what are we going to push to our shiny new stack? Arbitrary numbers coming from our lovely operator! For this, we need to be able to parse numbers, which is done by the new parsedecimal routine. It takes a null-terminated string as an input9 and returns a number on success, that success being indicated by the Z flag.

This is the first time that we use a flag as a return value for our routine. What this means is that after having called the routine, one can directly use jz or jnz to jump according to parsing success. No cmp necessary.

The routine goes as follow:

  1. Result starts at 0.
  2. Multiply result by 10.
  3. Get char and increment index to next char.
  4. Subtract ASCII ‘0’ from the character.
  5. Is it outside the 0 and 9 range? Not a number, bail out.
  6. Is next char zero? We’re done, number is good.
  7. Add the number to the accumulating result.
  8. Goto 2.

You’ll notice that the routine is broken on empty strings, but since our input come exclusively from readword which can’t yield an empty string, everything is fine, just fine.

This is the first time that we use mul, which is special because of its operand rigidity. It can only be called with the AX destination operand, which is implied (we don’t specify it in the code). Also, it can’t be called with an immediate operand, which is why we have to use CX.

It is also the first time that we use cmp with operands that don’t involve a register. These instructions are particular because they must specify their width. When we write something like cmp al, [si], the assembler knows that [si] refers to an 8-bit memory area because the target register is 8-bit. How could it be otherwise? However, when we write cmp [si], 0, the assembler doesn’t know if we want to compare the 8-bit memory area or the 16-bit memory area to the immediate 0. Hence, we must specify it with cmp byte [si], 0.

This is also the first time I begin using the popular xor method of setting a register to zero. So far, I’ve used mov XX, 0 for clarity, but xor XX, XX results in a more compact binary code and all experienced assembly programmer use this form, so that’s what you’ll see in the wild10.

Completing the Interpret Loop

This new routine allows us to complete the Interpret Loop that I’ve described earlier. As you can see in the code added in the loop routine at the top, we call parsedecimal and then conditionally skip the code that pushes the resulting number to PS. You’ll see that we do that push through a macro because pushing to and popping from PS is something we’ll do a lot.

I won’t cover NASM macro syntax because that is not a tool that is part of our overwhelmingly powerful future. It’s a crutch that we use to push ourselves into our new world, but it’s not that interesting of a tool. Forth is much better suited to the task of assembling binaries than programs bound to the limitations of C and UNIX.

However, you’ll notice that I’ve added 3 macros, firstword, defword and lastword to help lighten the boilerplate associated with the knitting of the initial dictionary. This is NASM voodoo. If you want, you can try to decipher it, but I don’t think it’s worth it. The important thing to know is that the end result is exactly the same as before, just a little less verbose.

Number formatting

The last piece of the puzzle we’ll need is a way to print number to screen. This time, it won’t be in an assembler routine, but in a Forth word. This way, the operator can decide when to print a number. This word, in Forth land, is called “.” (a dot). It pops the top element of PS and prints it as a decimal. It’s located at the bottom of our code. Maybe you’ll notice that I’ve added a stack signature comment to it, indicating that after calling it, PS will be one element smaller. Its logic goes as follow.

  1. Pop PS in a register.
  2. Divide by 10, keeping both quotient and remainder.
  3. Is quotient nonzero? Push remainder to hardware stack, call step 2, then pop remainder afterwards.
  4. Add ASCII ‘0’ to remainder and print it through spitchar.
  5. Return.

Now this is some pretty mind bending recursion. You remember when you first learned about recursion? You thought it was cool, tried using it in all kinds of situations, and then it turned out not to be optimal code in those situations so you’ve stopped using them, disappointed not to encounter problems to flex your “recursion muscles” often enough. Well, it turns out that low level code is where’s the real deal at! Recursion is often optimal in here.

So, the recursion. Did you understand it? The challenge here is to be able to print digits starting from the most significant one. Because of this, a simple loop would be complicated because you’d have to keep track of the number of digits you’ve processed, backtrack, loop again. Ugly. Instead, we let the call stack be our “digit bean counter”. By carefully arranging the order of our instructions, we only have to loop once.

The loop is also arranged so that a zero input will result in “0” being printed rather than nothing.

Code-wise, there’s not much to say, we’ve seen it all already. div has similar operand constraints as mul, with the additional one that 16-bit division actually takes a 32-bit dividend, with the most significant part being in DX. Therefore, DX must be set to zero if all we want is to divide AX, otherwise the result will be wrong.

Discovering our newfound powers

Maybe you haven’t realized it, but our little system grew much more powerful through these simple additions. Not only can we parse and print decimal numbers, but arithmetics are now trivial to implement! I’ve added the words “+” and “-”, others will also be trivial to add. The system operator can now write things like “42 12 + .”.

Another addition is the color! word11, which is the parametrized version of red, green and blue. This word allows you to set the active color to any number you type. For example, “15 color!” will set it to white. I’ve thrown in pos! for good measure.

Can’t you feel it? Can’t you feel that this engine is going to roar soon, sending the operator straight to the moon?


  1. Add other arithmetic words.
  2. What happens if you add a word with a name that corresponds to a decimal number? Can you call it?
  3. If we want to compute the square of a number, what are our options? Besides the obvious options, are there some more elegant and generalized ways to proceed?

Up next

In the next article, we’re going to allow the operator to create, through combination, new words at runtime, unleashing their creative powers!

Next: Baby’s first steps

  1. Stack Pointer, what else? 

  2. Someone doing software archeology will notice that my first stack implementations in Collapse OS were precisely of the type that didn’t make much sense, leading to awkwardness in code. 

  3. When you think about it, it might have to do with 8-bit operations. i386 doesn’t allow 8-bit push/pop, but if it did, the growing downwards would make a lot more sense. When the stack grows downwards, the sentence “SP points to the top element” stays true in both 8-bit and 16-bit mode. When it grows upwards, it can only be true for one of the modes. 

  4. Or maybe it’s just more beautiful, number-wise? For example, if you want to have a stack that cover the upper 64KB memory area, you set SP to 0. If the stack grew upwards and, inversely, you wanted to cover the beginning of the 64KB area, you’d have to initialize SP to 0xfffe, which is uglier. Maybe our “ancestors” (where the tradition comes from) had some sense of number aesthetics? 

  5. We will talk more about the Return Stack later, when we look at our different options in terms of execution models, but for now, it’s the same thing as the Hardware Stack. 

  6. 2KB for the Return Stack, should be enough. 

  7. Or the opposite, which ones are destroyed. 

  8. Or you analyse the code and add a new comment. 

  9. the “z” in “zname” indicates null-termination. That’s just my little personal convention. 

  10. except when flags need to be preserved, in which case “mov” will be used. 

  11. In Forth speak, the “!” symbol means “store”.