We now know the secret ingredients that make Forths alive, but we won’t go all the way to a full Forth because it would be much less interesting than the first magical moments of life. Moreover, I already have one for you: Dusk OS.
Dusk OS is a 32-bit STC Forth that broadly follows Starting Forth’s conventions, with a few caveats:
HEX/DEC
mode. Hexadecimal literals are prefixed with $
, like
in $1234abcd
. There is also support for character literals like ‘X’
.
Printing a literal in hexadecimal is done with the .x
word.DO..LOOP
changes to for2..next
because Dusk has fancy support for
iterators.VARIABLE
changes to value
through Big Moustache.You can of course read Dusk’s documentation which will tell you all you need to know, but you don’t have to. Your knowledge from having read Starting Forth will be enough, I’ll fill you in as needed.
With all this ruckus about building a Forth and then discovering a whole new one, where were we again? Ah yes, we want to compile this:
int foo(int a, int b) {
return a + b;
}
int main() {
return foo(42, 12);
}
Compiling this, alright, but what to? Under UNIX, this compiled into an ELF executable that would be called and yield the result through its program return code. Under a Forth, we would rather want to compile this to a simple word.
The main
function is a crutch that only UNIX needs. What we’d actually want
to do with our compiled word is this:
42 12 foo . \ prints 54
This means we can ignore the main
function. So, just like that, half our work
is done!
Of course, we could compile this with Dusk’s existing C compiler, but we
wouldn’t be learning much. The rest of this story arc will be devoted to write
the code necessary to minimally compile the foo
function’s source code into
an executable word under i386 Dusk.
We will first build an assembler for the few instructions we need, then we’ll build ourselves a C tokenizer, and finally write the code that parses these incoming tokens and generate the corresponding i386 code through the assembler.
Let’s prepare our stuff. First of all, this part of the story arc was written with Dusk v8 as a reference. If you use another version, snippets might not work.
Then, make sure that you can run the i386 version of Dusk. If you're on an Intel CPU on a host OS that allows usermode Dusk to run, it's the best option. Run the dusk-curses flavor.
Otherwise, your other option would be emulating a PC with QEMU. This can be done easily through the Dusk deployments package.
For this to work, you'll need both QEMU and mtools installed. Then, go in the pc-piix subfolder and run make emul.
For this story arc, we’ll make the filesystem root our workspace. We’ll write
our work in files at the root filesystem and then load them in Dusk. Let’s test
that this works by creating a file called myasm.fs
in the fs/
subfolder in
Dusk’s source code, with this content:
."Hello Dusk!\n"
Then, run Dusk and at prompt, type ”f<< myasm.fs”
. You get the message? Good!
We’re ready to build our barebone i386 assembler.
As mentioned above, we’ll go for a minimal solution for the presented problem,
which means a minimal assembler. To determine the scope of that assembler,
let’s try to see what foo
would look like in NASM. You should already have an
idea since it’s functionally equivalent to the +
word we’ve already
implemented in our baby Forth, with these differences:
E
(for “extended”) prefix.PS
is ESI
rather than BP
.PS
Top Of Stack element in the EAX
register.The last point needs a special mention. Brad Rodriguez, in the same excellent “Moving Forth” article I’ve already linked previously, speaks of the subject in the section named “Top-Of-Stack in Register”. Dusk OS does this. Not only for performance reasons, but also because of the HAL, which is a vast subject we’ll not talk about now.
This peculiarity changes the logic for our word a lot. What was:
PS
in a register.PS
top.PS
top.Becomes:
PS
top is already in EAX
.[ESI]
to EAX.
ESI
stack by 4 (that is, add 4). Result in EAX
, which is still
our PS
top.In NASM, this would look like:
BITS 32
add eax, [esi]
add esi, 4
ret
This would mean that to compile foo
, our assembler would only need two
instructions, add
and ret
. Should be easy!
Alright, let’s get started. First, we need a reference. “Intel 80386 Programmer’s Reference Manual” will do. It’s a big document, but I’ll walk you to the important parts, so there’s no need to read it beyond the pages I specifically mention, but you have to read those pages to follow this article. You might not understand the totality of what you read there, but that’s fine, my goal is to explain these pages to you.
Should we begin with the easy instruction? Let’s encode ret
! The reference
PDF has a Table of Contents with each instruction listed there. Through this
TOC, you’ll easily find ret
at p. 378. We see on that page that there are
multiple forms of this instruction, but we will not bother with the fancy
versions and use only the regular one, the “near” version. The listing on that
page tells us that this particular instruction has a simple opcode, C3
, with
no parameter. Too easy! We can thus add our first word in myasm.fs
:
: ret, $c3 c, ;
Why ”ret,”
instead of ”ret”
? To indicate that the effect of this word is to
“write” to “here”. It’s just better form, Forth wise.
That assembler can now create its first word:
f<< myasm.fs
code nothing ret,
You can now call nothing
and see that it does nothing. To be sure, you could
try inspecting your stacks before and after with the .S
utility word.
Starting Forth doesn’t talk about code
, but it’s a frequent word in Forths
that support assemblers. “code xxx”
is the equivalent to “: xxx”
except
that it doesn’t go in compilation mode. This means that words we call
afterwards are interpreted. The result of the snippet above is thus to create
an empty “nothing”
dictionary entry, followed by C3
. You can verify this
with “‘ nothing dump”
.
So, that’s if for ret
. Does this mean that half the job is done? Obi-Wan
would answer “yes, from a certain point of view”. But you know better than to
take the words of ghosts at face value right?
add
is a much more complex instruction (p. 261) and we need to cover two of
its forms:
dword
in x86-speak) where ESI
points to and combine it to register EAX.
ESI
.How should we encode them? When you look at the i386 reference, it’s hard to understand what you read and find the encoding you’re supposed to use. You easily see the “EAX, imm32” form, which should fit one of the forms if it targeted the right register, but the rest of the forms are gibberish. You can guess that “r32” means a reference to a 32-bit register, but what’s “r/m32”? How do we handle indirect references? The answer is that these addressing forms are encoded in x86’s famous “modr/m” encoding (p. 241).
As we’ve already seen before, the x86 family has a wide range of addressing
modes for its instructions. Those addressing modes are encoded in a second byte
following the main instruction opcode. Except for the “shortcut forms” of some
instructions (such at the “EAX, imm32” one we see for add
), most instruction
forms involving a register or a memory location will have the modr/m byte.
There’s at most one per instruction, and it has this bit
structure:
b7:6
modb5:3
regb2:0
r/m (register or memory)The naming is confusing, but to Intel’s defense, I don’t see what other names I’d use. Those fields are too flexible for a more specific naming scheme.
The mod
field selects one of the 4 general addressing modes1 documented at
p. 244:
00 [reg]
01 [reg + 8b offset]
02 [reg + 32b offset]
03 reg
In other words, whenever the modr/m byte fits the C0
mask, it means that both
operands are “direct”, otherwise, one of the operands is “indirect”.
The next two fields select the operands for the instructions using register IDs
also documented in p. 2442. The reg
field is generally used for the
"direct register" operand and is not affected by mod
. In the “add eax,
[esi]”
instance, this field would contain a reference to EAX
, thus 0. This
field is not always the destination. For example, in “add [esi], eax”
,
the reg
field would also contain EAX
. It’s the instruction opcode that
determines operands order (01
vs 03
)34.
The last field, r/m
, is the ID of the register that will be affected by
mod
. For example, under a mod 00
, we would set r/m
to ESI
(6) to
express [ESI]
.
With this knowledge, we can encode “add eax, [esi]”
in our head. First, we
need to select the right add
form, which is the “r32, r/m32” one. This means
a 03
instruction opcode. Then, we need a modr/m byte with mod=00 reg=EAX
r/m=ESI
, which means 06
. This gives us a final encoded instruction: 03 06
.
Is this what NASM gives us? Great, we agree!
First of all, let’s have useful constants:
0 const eax
1 const ecx
2 const edx
3 const ebx
4 const esp
5 const ebp
6 const esi
7 const edi
We’ll use these constants as arguments to our API. Now, the tricky part is to
elegantly express the “use EAX
as a direct destination and use ESI
as an
indirect source” command to the add
instruction. Dusk’s i386 assembler has
such an API, but the scope of this subject is too wide for this story arc.
We’ll defer this to another story arc and go with a brutally simple, albeit
ugly, API: specifying the form in the word name.
For example, for the form we need, we will add a word named “addr[], ( dst src
-- )”
which means “write the add
instruction in its “r32, r/m32” form with
mod=00 dst=r32 and src=r/m32
”. With what we already know of i386 encoding,
the implementation of this word is trivial:
: addr[], ( dst src -- ) $03 c, swap 3 lshift or c, ;
This allows us to create a new word:
code foo eax esi addr[], ret,
42 12 foo . \ prints 54
So, that’s it? Success? From a certain point of view, yes, but this word has a
problem: it leaks an element to PS
. Instead of the signature “a b -- n”
that we want, it has the signature “a b -- a n”
. To drop the a
, we need to
“nip” it from the stack with “add esi, 4”
.
The instruction form that allows this is the “r/m32, imm32”5 form. This one
also has a modr/m byte, but also a 32-bit immediate that will follow it. The
modr/m byte in this case is special: it only has one operand. In these cases,
only the mod
and r/m
fields are used, reg
stays empty. However, to
pack as many instructions as possible in as few bytes as possible, i386
encoding scheme use these 3 precious bits to expand the tight 8-bit base opcode
space. Therefore, for the “r/m32, imm32” form, add
shares the 81
opcode
with other instructions such as sub, adc,
etc. So, you can’t put anything
in the reg
field, you have to put what Intel tells you to put. In the
documentation, it’s the number after the slash. In our case /0
.
Wrapping all this up, this allows us to add our new word:
: addri, ( reg imm -- ) $81 c, swap $c0 or c, , ;
You guessed what the ”$c0 or”
was for? To select mod=03
! With this new
word, we can complete our foo
word:
code foo
eax esi addr[],
esi 4 addri,
ret,
42 12 foo . \ prints 54
You can now see, with .S
, that we don’t leak to PS
anymore! This completes
our extremely minimal i386 assembler. We’ll be able to use it in our C
compiler, when comes the time of generating the code.
The source code for our toy assembler can be downloaded here
We won't go further, assembler-wise, in this story arc. If you wish to explore this subject further, I recommend you a good tool to help you visualize x86 encoding. Then, you can of course take a look at Dusk's i386 assembler.
In the next article, we’ll set aside our shiny new assembler for a while as we tackle the first part of a C compiler by building a tokenizer.
I ignore special cases for simplicity reasons and because we won’t use them in this story arc. ↩
EAX=0 ECX=1, etc. ↩
It’s also this opcode that determines that the operation is a 32-bit one.
If we wanted to do the same operation with the same operands in 8-bit, we’d
change the opcode to 00
or 02
. What about 16-bit? It’s complicated and out
of scope of this article. ↩
It’s a bit confusing, but one realization about i386 will help you grok
the logic. In i386, it’s impossible to have an instruction with two “mod”
operands. For example, “add [esi], [eax]”
is impossible. Therefore, one of
the operands is always “direct”. reg
is the one. ↩
Let’s ignore the imm8 optimization for now. ↩