Now that we have a bootloader, we have enough breathing space for a powerful system. What shall it be? A popular first system is a BASIC interpreter. An advantage of BASIC is that you already know how you’d go about implementing it, it’s intuitive. It’s also genuinely simple to implement.
The problem with BASIC is that it can’t climb very high in the abstraction ladder. Code quickly develop into a spaghetti mess as it grows more complex. It’s not realistic to imagine a C compiler developed in BASIC1.
But the naive simplicity of the concept is enticing, I understand if you’re aching to implement one. I’ll wait…
Back yet? Alright, now let’s write a Forth.
I imagine that right now, you're feeling a bit like Alice. Tumbling down the rabbit hole?
Unfortunately, no one can be told what Forth is, you have to see it for yourself.
But what the heck, I’m going to try anyways: Forth is the combination of very simple concepts leading to an explosion of possibilities, which if skillfully2 combined lead to other explosions of possibilities, and so forth. The first of such concepts we’ll explore is the most important one: the interpret loop.
The Forth interpret loop is very simple:
And that’s all there is to it. In this article, we’ll only cover step 1, with other steps being covered in following articles.
When we say “Read a word from input”, what’s “input”? It can be multiple things, but there’s only one at once. In Dusk OS, the input stream starts as a pointer in memory6 where initialization code is located and that initialization code eventually switches the input stream to the keyboard (with a readline buffer). When loading the contents of a Forth source file, the input stream temporarily switches to the contents of the file, to then come back to interactive.
Our input stream will, for now, feed itself directly from the horse’s mouth: the keyboard. As you can see from the implementation I’ve written for you, the BIOS makes it quite straightforward.
If we begin reading the code at the loop
label in kernel.asm
, we see that
readword
is repeatedly called, and if we look at that routine, we see that
it’s feeding itself from readkey
, which is a simple proxy for INT16H AH=00
.
This BIOS call is blocking, which means it only returns when a key has been
pressed. There’s nothing else to it, that’s our input stream.
The input stream is feeding us character by character and our job is to accumulate those characters into a word. When that word is finished being accumulated, we can go to step 2 in the interpret loop. For now, we don’t have a step 2, so we’ll just print the word.
The rule for parsing words in Forth is simple: whitespaces are words boundaries. Therefore, the parsing process is:
You have a working loop! … for a certain definition of “working”. Being directly wired to the keyboard, without any line editing mechanism, makes the system very difficult to use. But for now, this will have to do because it will be a while before we add a line buffer.
Things are getting real on the assembler side. I wouldn’t want my articles to be i386 tutorials or references, there’s tons of them. If you are so inclined, I recommend building up some i386 skills now, but I don’t think that understanding every single line of code is necessary to understand the general idea behind my articles.
In my example code, this logic is in the readword
routine. In this routine,
the bp
register serves as the destination pointer for the typed characters.
This pointer start at the curword
variable.
By the way, what the heck is curword
? For someone who’s new to low level
programming, it’s challenging to figure how “work memory”, that is memory that
isn’t code, is organized. How do you allocate variables? How do you free them?
At this level, the answer is: you don’t. The whole memory is yours. A
“variable” is only an address in memory. In our code, we only have one
variable, so we don’t ever bother sizing it. We just say “here’s then end of
the code, we can read and write stuff there”.
Another new concept are the []
brackets. In NASM syntax, they mean
indirection. Therefore, when we write [bp]
, we mean “memory location whose
address is contained in register BP”. Therefore, if BP
is 0x1234, then mov
al, [bp]
copies the byte at address 0x1234 into the AL
register.
In the code, point number 1 of the logic described above is implemented in the
loop called .loop1
and points 2 and 3 are implemented in the .loop2
loop.
These loops are conditional and work like a do..while
in a regular language.
However, the idea of “if x < y” doesn’t exist in i386 assembler. All
conditionals are done in two parts: first, a cmp
instruction that sets flags,
and then, a jump instruction conditional to those flags. Therefore the “cmp
al, 0x21 \ jb .loop1”
7 we have there is equivalent to “while al < 0x21”.
readword
was the hard part. Now that it’s done, it’s only a matter of calling
it, then sending it to the screen. That’s what the main loop does. If you look
at the printmsg
routine, you’ll see that it doesn’t use the convenient
INT10H AH=13
which spits a string, but rather INT10H AH=0E
which only spits
one character.
The reason for this is that INT10H AH=13
spits a string at a particular
row/column destination. If we wanted to spit our words one after the other,
we’d have to maintain some kind of cursor and it would be complicated. The
INT10H AH=0E
function already does that for us, at the cost of wrapping a
loop around it. So that’s what we do.
And that’s all there is to it! We have a simple bare metal program that splits input from keyboard into words and spits it to screen.
If you’re having difficulties keeping up the pace, let me know.
The interpret loop is one concept. In the next article, we combine it with the “dictionary” concept for an explosion in possibilities!
We can imagine a fancy BASIC with constructs allowing abstractions complex enough to elegantly develop a C compiler, but that BASIC would be more complex than a Forth. If you end up on that path, come back here! We have a Forth to write. ↩
That’s the key word here. A majority of the possible combinations lead to abominations. ↩
In traditional Forths, steps 2 and 4 are inverted, but I personally prefer that order. More on this later. ↩
We’ll see what it is later. ↩
Also for later. ↩
With the contents having been loaded from the bootloader. ↩
"jb” means “jump if dest operand is below source operand”. ↩