Tumble Forth: Words in the shell

Now that we have a bootloader, we have enough breathing space for a powerful system. What shall it be? A popular first system is a BASIC interpreter. An advantage of BASIC is that you already know how you’d go about implementing it, it’s intuitive. It’s also genuinely simple to implement.

The problem with BASIC is that it can’t climb very high in the abstraction ladder. Code quickly develop into a spaghetti mess as it grows more complex. It’s not realistic to imagine a C compiler developed in BASIC1.

But the naive simplicity of the concept is enticing, I understand if you’re aching to implement one. I’ll wait…

Back yet? Alright, now let’s write a Forth.

What is Forth?

I imagine that right now, you're feeling a bit like Alice. Tumbling down the rabbit hole?

Unfortunately, no one can be told what Forth is, you have to see it for yourself.

But what the heck, I’m going to try anyways: Forth is the combination of very simple concepts leading to an explosion of possibilities, which if skillfully2 combined lead to other explosions of possibilities, and so forth. The first of such concepts we’ll explore is the most important one: the interpret loop.

The Interpret Loop

The Forth interpret loop is very simple:

  1. Read a word from input.
  2. Is it a number?3
  3. Yes? Push that number to Parameter Stack4 and go to 1.
  4. No? Then is that word in the dictionary5?
  5. Yes? Execute it and go to 1.
  6. No? “Word not found” error. Go to 1.

And that’s all there is to it. In this article, we’ll only cover step 1, with other steps being covered in following articles.

The input stream

When we say “Read a word from input”, what’s “input”? It can be multiple things, but there’s only one at once. In Dusk OS, the input stream starts as a pointer in memory6 where initialization code is located and that initialization code eventually switches the input stream to the keyboard (with a readline buffer). When loading the contents of a Forth source file, the input stream temporarily switches to the contents of the file, to then come back to interactive.

Our input stream will, for now, feed itself directly from the horse’s mouth: the keyboard. As you can see from the implementation I’ve written for you, the BIOS makes it quite straightforward.

If we begin reading the code at the loop label in kernel.asm, we see that readword is repeatedly called, and if we look at that routine, we see that it’s feeding itself from readkey, which is a simple proxy for INT16H AH=00. This BIOS call is blocking, which means it only returns when a key has been pressed. There’s nothing else to it, that’s our input stream.

Splitting into words

The input stream is feeding us character by character and our job is to accumulate those characters into a word. When that word is finished being accumulated, we can go to step 2 in the interpret loop. For now, we don’t have a step 2, so we’ll just print the word.

The rule for parsing words in Forth is simple: whitespaces are words boundaries. Therefore, the parsing process is:

  1. Discard all whitespace characters (<= 0x20).
  2. Accumulate characters as long as they’re not whitespaces.
  3. Stop the loop when a whitespace is encountered.

You have a working loop! … for a certain definition of “working”. Being directly wired to the keyboard, without any line editing mechanism, makes the system very difficult to use. But for now, this will have to do because it will be a while before we add a line buffer.

Things are getting real on the assembler side. I wouldn’t want my articles to be i386 tutorials or references, there’s tons of them. If you are so inclined, I recommend building up some i386 skills now, but I don’t think that understanding every single line of code is necessary to understand the general idea behind my articles.

In my example code, this logic is in the readword routine. In this routine, the bp register serves as the destination pointer for the typed characters. This pointer start at the curword variable.

By the way, what the heck is curword? For someone who’s new to low level programming, it’s challenging to figure how “work memory”, that is memory that isn’t code, is organized. How do you allocate variables? How do you free them? At this level, the answer is: you don’t. The whole memory is yours. A “variable” is only an address in memory. In our code, we only have one variable, so we don’t ever bother sizing it. We just say “here’s then end of the code, we can read and write stuff there”.

Another new concept are the [] brackets. In NASM syntax, they mean indirection. Therefore, when we write [bp], we mean “memory location whose address is contained in register BP”. Therefore, if BP is 0x1234, then mov al, [bp] copies the byte at address 0x1234 into the AL register.

In the code, point number 1 of the logic described above is implemented in the loop called .loop1 and points 2 and 3 are implemented in the .loop2 loop. These loops are conditional and work like a do..while in a regular language. However, the idea of “if x < y” doesn’t exist in i386 assembler. All conditionals are done in two parts: first, a cmp instruction that sets flags, and then, a jump instruction conditional to those flags. Therefore the “cmp al, 0x21 \ jb .loop1”7 we have there is equivalent to “while al < 0x21”.

Putting it together

readword was the hard part. Now that it’s done, it’s only a matter of calling it, then sending it to the screen. That’s what the main loop does. If you look at the printmsg routine, you’ll see that it doesn’t use the convenient INT10H AH=13 which spits a string, but rather INT10H AH=0E which only spits one character.

The reason for this is that INT10H AH=13 spits a string at a particular row/column destination. If we wanted to spit our words one after the other, we’d have to maintain some kind of cursor and it would be complicated. The INT10H AH=0E function already does that for us, at the cost of wrapping a loop around it. So that’s what we do.

And that’s all there is to it! We have a simple bare metal program that splits input from keyboard into words and spits it to screen.

Exercises

  1. Make the input routine echo typed characters directly (words are printed in double).
  2. Map some words to routines. If names match, the corresponding routine is executed instead of the name being merely printed.

If you’re having difficulties keeping up the pace, let me know.

Up next

The interpret loop is one concept. In the next article, we combine it with the “dictionary” concept for an explosion in possibilities!

Next: Do Look Up


  1. We can imagine a fancy BASIC with constructs allowing abstractions complex enough to elegantly develop a C compiler, but that BASIC would be more complex than a Forth. If you end up on that path, come back here! We have a Forth to write. 

  2. That’s the key word here. A majority of the possible combinations lead to abominations. 

  3. In traditional Forths, steps 2 and 4 are inverted, but I personally prefer that order. More on this later. 

  4. We’ll see what it is later. 

  5. Also for later. 

  6. With the contents having been loaded from the bootloader. 

  7. "jb” means “jump if dest operand is below source operand”.