Nice little pétarade we had last time right?A simple structure, a simple lookup routine, and suddenly, a new dimension opens to the system operator. As I’ve already said, this is just the beginning. This time, we add a Parameter Stack into the mix which will allow the parametrization of our words.
A stack is a data structure with a bottom at a fixed address upon which data is pushed. A pointer to the top of the stack is kept, allowing us to pop from the stack in the opposite order. This data structure has some interesting properties which we’ll rely on:
We’ve already been using a stack from the very moment we’ve encoded a
instruction. Most CPUs have a register dedicated to a hardware stack, upon
which the address of the caller is pushed, allowing the callee to return back
to that address when finished. In the x86 family, that register is
That stack is often used by programmers for more than return address keeping. The most frequent use of it is for what we call “preserving registers”, that is, to keep a value stored in a register for later before we overwrite it with something else. We push, do our thing, then pop back when we’re finished. This is often needed when calling pieces of code which we don’t control, such as the BIOS. Or, you know, sometimes there’s just not enough registers to juggle our data without dropping some of it.
“Can we really do that, mix return addresses and data? Don’t we risk ending up jumping in the middle of nowhere?” might you ask. The answers are yes and yes! which is why stack balance is paramount. This means that whatever you push on the stack absolutely must be popped back in equal size before you return from the routine. There is absolutely no room for error in this area.
That stacks grow down often confuses the newcomer. You might have noticed
already that I initialize
SP to zero in the code so far. “Is it because he
puts his stack at the beginning of the address space? Isn’t that space supposed
to be taken?” you might have asked yourself. Good question, but no: I’m in fact
placing the stack at the very end of the 64KB address space.
push operation begins by decreasing
SP by 2, and only then write its
data there. This means that if we want our bottom element to live at address
SP needs to be initialized to zero.
This order of operation has an effect worth remembering: this means that except
for when the stack is empty,
SP always points to the element at the top of
the stack. If the order of operation was inverted,
SP would point to the
space where the next item will go. This of course wouldn’t make much
sense2, which is why hardware stacks operate this way.
Why grow downwards? Difficult to say. Stacks can grow in any direction, theoretically, but in practice, they very often grow down, probably by tradition34.
Forths have two stacks, the Return Stack5 (
RS) and the Parameter Stack
PS). The Return Stack has the responsibility of keeping track of return
addresses when we reach the end of a word and the Parameter Stack is for
passing arguments from word to word.
Having this second stack frees us from the burden of having to keep
balance intact when passing arguments to another word. This simplifies calling
semantics a lot. To measure the gain, go look again at x86 C calling
conventions, which you should understand a little better now, and see
what kind of hoops it has to jump through to manage argument passing with only
one stack. Pretty hairy stuff right?
In Forth, it’s much simpler. Instead of copying, pushing, freeing arguments
around like a bulldozer,
PS is simply shared among all words and those words
simply “flow” through it, applying what we call their “stack effect” to it.
For example, let’s imagine a word “+” which would take the top two elements of
PS, add them, then push the result. We would then say that “+” has a stack
effect of “2 elements in, 1 element out”. In Forth, there is a special way of
expressing this stack effect that looks a little bit like a function signature:
“a b -- n”, which means “arguments a and b come in, result n come out”. The
documentation for a word will often reference those names. For example, for
“+”, it could be “Add a and b together and return as n”.
If you look at the Parameter stack I’ve implemented for you, you’ll
see that I assigned
PS to the
BP register, initializing it to 0xf8006.
This means that across the whole code, this register’s value must be preserved
at all times. So far, we had enough registers to go around merrily, without
having to worry about register preservation. All we had to do was to be careful
not to step on another routine’s toes. These times are over.
As you can see, I’ve replaced
BP usage across the whole kernel, and in some
places, I’ve had to add
push/pop pairs. You will also notice a new kind of
comments associated with some routines: preserves register XX.
It is common, when writing assembler code, to document which registers are preserved7 so that one knows, before calling a routine, which of the registers they need to preserve themselves. This is also very useful when we go back to work in that routine. The "preserve" comment indicates "hey! callers are depending on this, don't touch this register!". Generally, when there are no comment, you assume the worst, that all registers are destroyed8.
So, what are we going to push to our shiny new stack? Arbitrary numbers coming
from our lovely operator! For this, we need to be able to parse numbers, which
is done by the new
parsedecimal routine. It takes a null-terminated string as
an input9 and returns a number on success, that success being indicated by
the Z flag.
This is the first time that we use a flag as a return value for our routine.
What this means is that after having called the routine, one can directly use
jnz to jump according to parsing success. No
The routine goes as follow:
You’ll notice that the routine is broken on empty strings, but since our input
come exclusively from
readword which can’t yield an empty string, everything
is fine, just fine.
This is the first time that we use
mul, which is special because of its
operand rigidity. It can only be called with the
AX destination operand,
which is implied (we don’t specify it in the code). Also, it can’t be called
with an immediate operand, which is why we have to use
It is also the first time that we use
cmp with operands that don’t involve a
register. These instructions are particular because they must specify their
width. When we write something like
cmp al, [si], the assembler knows that
[si] refers to an 8-bit memory area because the target register is 8-bit. How
could it be otherwise? However, when we write
cmp [si], 0, the assembler
doesn’t know if we want to compare the 8-bit memory area or the 16-bit memory
area to the immediate 0. Hence, we must specify it with
cmp byte [si], 0.
This is also the first time I begin
using the popular
xor method of setting a register to zero. So
far, I’ve used
mov XX, 0 for clarity, but
xor XX, XX
results in a more compact binary code and all experienced assembly programmer
use this form, so that’s what you’ll see in the wild10.
This new routine allows us to complete the Interpret Loop that I’ve described
earlier. As you can see in the code added in the
loop routine at the
top, we call
parsedecimal and then conditionally skip the code that pushes
the resulting number to
PS. You’ll see that we do that push through a macro
because pushing to and popping from
PS is something we’ll do a lot.
I won’t cover NASM macro syntax because that is not a tool that is part of our overwhelmingly powerful future. It’s a crutch that we use to push ourselves into our new world, but it’s not that interesting of a tool. Forth is much better suited to the task of assembling binaries than programs bound to the limitations of C and UNIX.
However, you’ll notice that I’ve added 3 macros,
lastword to help lighten the boilerplate associated with the knitting of the
initial dictionary. This is NASM voodoo. If you want, you can try to decipher
it, but I don’t think it’s worth it. The important thing to know is that the
end result is exactly the same as before, just a little less verbose.
The last piece of the puzzle we’ll need is a way to print
number to screen. This time, it won’t be in an assembler routine, but in a
Forth word. This way, the operator can decide when to print a number. This
word, in Forth land, is called “.” (a dot). It pops the top element of
PS and prints it as a decimal. It’s located at the bottom of our
code. Maybe you’ll notice that I’ve added a stack signature comment to it,
indicating that after calling it,
PS will be one element smaller.
Its logic goes as follow.
PSin a register.
Now this is some pretty mind bending recursion. You remember when you first learned about recursion? You thought it was cool, tried using it in all kinds of situations, and then it turned out not to be optimal code in those situations so you’ve stopped using them, disappointed not to encounter problems to flex your “recursion muscles” often enough. Well, it turns out that low level code is where’s the real deal at! Recursion is often optimal in here.
So, the recursion. Did you understand it? The challenge here is to be able to print digits starting from the most significant one. Because of this, a simple loop would be complicated because you’d have to keep track of the number of digits you’ve processed, backtrack, loop again. Ugly. Instead, we let the call stack be our “digit bean counter”. By carefully arranging the order of our instructions, we only have to loop once.
The loop is also arranged so that a zero input will result in “0” being printed rather than nothing.
Code-wise, there’s not much to say, we’ve seen it all already.
similar operand constraints as
mul, with the additional one that 16-bit
division actually takes a 32-bit dividend, with the most significant part being
DX must be set to zero if all we want is to divide
otherwise the result will be wrong.
Maybe you haven’t realized it, but our little system grew much more powerful
through these simple additions. Not only can we parse and print decimal
numbers, but arithmetics are now trivial to implement! I’ve added the words
“-”, others will also be trivial to add. The system operator can
now write things like
“42 12 + .”.
Another addition is the
color! word11, which is the parametrized version
blue. This word allows you to set the active color to
any number you type. For example,
“15 color!” will set it to white. I’ve
pos! for good measure.
Can’t you feel it? Can’t you feel that this engine is going to roar soon, sending the operator straight to the moon?
In the next article, we’re going to allow the operator to create, through combination, new words at runtime, unleashing their creative powers!
Next: Baby’s first steps
Stack Pointer, what else? ↩
Someone doing software archeology will notice that my first stack implementations in Collapse OS were precisely of the type that didn’t make much sense, leading to awkwardness in code. ↩
When you think about it, it might have to do with 8-bit operations. i386 doesn’t allow 8-bit push/pop, but if it did, the growing downwards would make a lot more sense. When the stack grows downwards, the sentence “SP points to the top element” stays true in both 8-bit and 16-bit mode. When it grows upwards, it can only be true for one of the modes. ↩
Or maybe it’s just more beautiful, number-wise? For example, if you want to have a stack that cover the upper 64KB memory area, you set SP to 0. If the stack grew upwards and, inversely, you wanted to cover the beginning of the 64KB area, you’d have to initialize SP to 0xfffe, which is uglier. Maybe our “ancestors” (where the tradition comes from) had some sense of number aesthetics? ↩
We will talk more about the Return Stack later, when we look at our different options in terms of execution models, but for now, it’s the same thing as the Hardware Stack. ↩
2KB for the Return Stack, should be enough. ↩
Or the opposite, which ones are destroyed. ↩
Or you analyse the code and add a new comment. ↩
the “z” in “zname” indicates null-termination. That’s just my little personal convention. ↩
except when flags need to be preserved, in which case “mov” will be used. ↩
In Forth speak, the “!” symbol means “store”. ↩