Tumble Forth: The Unbearable Immediateness of Compiling

It’s done! We have that crucial ingredient giving life to that little Forth, allowing it to walk by itself! Unfortunately, you’ll soon see that after a few steps, this little creature stumbles and falls. What is it lacking, a little balance maybe?

As I’ve hinted in the previous article, it foremost lacks a solid set of base words to build from. It also lacks this little something that would allow it to compile things like numeric literals in a word definition. Lastly, but not the least, one last speckle of pixie dust: immediateness.

In the hands of Leo Brodie

This is the moment in the series when I throw you in the gentle hands of Leo Brodie with his excellent introduction to Forth, Starting Forth. This introduces you to Forth from a “user” perspective and covers all the words that a regular well-formed Forth contains, which should help designing our own little Forth.

Why now rather than sooner? Because of the “user” perspective. I think that being introduced to Forth from this perspective makes it underwhelming to the newcomer. Forth’s strongest selling point is precisely that it allows you to approach it from an “operator” perspective1, something other systems/languages rarely allow because of their overwhelming complexity.

Had I thrown you directly to Leo, you’d have whipped up GForth, a Forth implementation that is too complex for its own good, and played with Forth as a mere user. Now, as you read Leo, your first reflex won’t be “let’s find an implementation I can try this concept in”, but rather “let’s whip up an implementation of this concept in my little Forth to try it on”.

I’d thus suggest that you interrupt the reading of this article to read Starting Forth. It’s not a requirement for this story arc to implement words you read about in our little Forth, but doing so will bring you joy, so why not? When some words give you troubles2, postpone. When you reach chapter 4 about conditionals, you can read it, but you are likely to need help for its implementation, at which point you can resume reading this article.

Parametrized compilation

Maybe you’ve given it thoughts already. What’s that little something that would allow us to compile number literals in a word definition? We somehow need to pass a parameter to a word at compile time, so we can’t just do like in our main loop and push that number to PS: that’s runtime! We need parametrized compilation.

There isn’t a million ways to parametrize something at compile time. We have to store that parameter somewhere and that parameter will be called back every time the word is called. The orthodox way to store such parameter is to write it to here right after a call to the associated “handler” word. In the case that occupies our mind, that “handler” could have the responsibility of reading its compile time parameter and push it to PS at runtime.

I don’t know of any name for that kind of routine. Me, I catch myself calling them “parens words”, because the code associated to them is often linked to a dictionary entry with a name inside parentheses, such as (br), (?br), (does), etc. Those words can’t be called directly because they require one or more compiled parameter next to them. But providing that we follow their argument structure, they can be written in a word definition.

If you look at the implementation I’ve written for you, you’ll see that I’ve added a new litn routine. Although it’s not wrapped in a word3, it’s what we could call a “parens word". Its goal is to read the parameter that has been written next to its call and push it to PS. How do we access this parameter? Through the Return Stack!

As with any call, the address where we should return has been pushed to RS before we jumped to litn. If the writer of the call has followed litn convention4, it has written the number parameter right after that call. Therefore, all we need to do is to:

  1. Pop the return address from the Return Stack.
  2. Dereference a 16-bit number at that address.
  3. Push that number to PS.
  4. Jump (not return) 2 bytes further to skip that number.

Then, all we need to do is to amend our : word so that it tries parsing input as a number and, when it’s a number, write a call to litn by following its argument convention. This happens in .writenumber.

You can now define words like:

: answertouniverse 42 . ;

and have this new word behave as expected. This also means that you could scrap the red, green and blue native words because they can be trivially recreated through color!.

Parametrize now!

So far so good. However, it does raise a question: is it possible to reference a “parens word” within the realm of Forth semantics? That would unlock quite an interesting set of possibilities, wouldn’t it? The answer is: for now, no. Our loop in the : word is too dumb to compile anything but straight calls to words and number literals. Does it mean that we’ll need to augment that routine whenever we want to add a new kind of “parens word”? Fortunately no, and this brings us to the last magical concept we’ll need to add to be able to reach the moon: immediate words.

Forth dictionary entries contain a flag, typically the most significant bit of the length field, to indicate whether the word is an immediate. In the interpret loop, this changes nothing. When the compiler loop encounters such a word, however, it executes it instead of compiling a call to it. And that’s all there is to it.

Code-wise, it’s simple. First, we have to modify dictfind to tell it to ignore the 8th bit of the length field. Then, we need to add a check on this bit in the : loop, which we do with the test instruction in front of the conditional jump to .immediate. What do we do when that condition arises? Call and carry on, what else!

This very simple mechanism opens a very important door giving us access to the exponential function, that is: tools that build tools that build tools that build tools…

This is the most mind bending, yet the most powerful concept there is in Forth. All the really cool things you’re going to do with it are likely to involve immediate words, maybe several levels of them. Going too deep in this logic leads to insanity, but it’s such a small price to pay for such power!

We try this mechanism with two new words, if and then, which I’ve already added. As you can see, their length field have the 8th bit set, indicating that they’re immediate. This means that their behavior is a compile time behavior.

The if and then words work around the condbr5 “parens word” which behaves a bit like litn, having the same number of parameters (one), but does this instead:

  1. Pop a value from PS.
  2. Is it zero?
  3. Yes? Read the parametrized address and then jump to it.
  4. No? Do nothing, but advance return address by 2 to skip the parametrized address.

The job of the if word is to compile a parametrized condbr, but unfortunately, we don’t know yet where we want to jump to! Therefore, what we do is we allocate 2 bytes for this parameter, but don’t set it. Instead, we push6 the address of that parameter to PS and let the compile loop do its thing.

Then comes the then immediate word, whose job is to take that address from PS7, see where we’re at, here-wise, and go place that address at our condbr parameter. Voilà! This allows things like:

: foo if hello then ;
1 foo --> prints "hello"
0 foo --> does nothing

I know it’s a big one to take. Take a while to think about it, experiment a little bit. By the way, have you noticed how this system works with nested branching?

The good news is that we’re done for real-real-real this time! These are all the important concepts that make a Forth. All the rest is building on these foundations, and the best part of it is that it’s built to your liking because you’re the one who’s likely to have built it.

Exercises

  1. How would you go about implementing else?
  2. You’ve read about create in Starting Forth. How would you go about it?
  3. Try yourself at does> if you want, but it’s a tough one to tackle for a newcomer, altough there’s nothing fundamentally new about it except that you have to hijack the compile loop to prematurely stop it. Hint: this “parens word” needs two parameters.

I hate to give good people bad news

We’re at the end of the “build a Forth”8 part of this story arc. It’s possible that you’ve grown attached to this little Forth we've been building, that you intend to make it into a real Forth. Unfortunately, this baby Forth is not a solid foundation for a complete Forth. I’ve optimized for the quickest path to “wow moments”, not for viability. Too much of its code is into assembler routines, where more of it should be in Forth words referencing each other. Too much of the logic has grown around null-terminated strings, but in the end, counted strings are easier to work with. The “;” detection logic is hackish.

If you want to build a Forth9, you're better off starting from scratch. And while you're at it, maybe target an architecture that is more elegant than i386?

By the way, I can’t let you go without mentioning jonesforth. It’s a i386 Forth with loads and loads of comments and people generally rave about it. Me, I don’t think it’s such a good way to be introduced to the subject, which is why I wrote this little series of articles, but it contains a lot of details I may have glossed over but that could be of use to you.

Towards a C compiler

The goal of this story arc is to fully understand how to compile a piece of C code, from scratch. My intent is to show that to you through the compiler I’ve developed in Dusk OS, which is a Forth. Now that we know how to build a Forth, we’ll enter Dusk’s world with a pretty good understanding of how it works underneath. No more secrets.

So, let’s leave that little Forth there, don’t listen to it whimpering, calling for your mercy, and let’s discover the brave new world of Dusk OS!

Next: From Dusk Till C


  1. Remember, you’re Tank! You ain’t some cog in the machine, machines bend to your will! 

  2. For example, the string literal. 

  3. Why? I talk about it at the end of the article. 

  4. If it hasn’t, fireworks! 

  5. "Branch conditionally” 

  6. At compile time, remember! Yes, I know, mind bending. 

  7. Still at compile time! 

  8. 8 articles, 8 bits. Coincidence? I don’t think so! 

  9. and I’d be very happy if you were! Let me know if you’re stuck at some point.