Tumble Forth: Buckle up, Dorothy

Hello, my name is Virgil Dupras, author of Collapse OS and Dusk OS and I'm starting a series of articles that aims to hand-hold my former self, a regular web developer, into the rabbit hole leading to the wonderful world of low level programming. Hopefully, I can hand-hold you too.

If you're like my former self, you treat what's underneath your API as a black box and it makes you uncomfortable. You feel that this lack of knowledge makes you a lesser developer. Whenever you try to dig a little bit into Linux, or POSIX, or other subjects you know you're supposed to know about, there's always a point where the complexity of the beast hits you in the face, making you shy away before you have the opportunity to gain broader understanding of your system.

An alternate way to go about this is to sidestep that complexity entirely and build your own OS. It's easier than you might think, and it's a whole lot of fun. Once you've done that, it's a lot easier to grok other systems. If it's a Forth you're building, you might not ever want to go back to other systems. Think I'm lying? Let's find out.

Let's start with a simple question: Through what mechanisms is the C code below compiled and then ran?

int foo(int a, int b) {
    return a + b;

int main() {
    return foo(42, 12);

Code in this article is also available in this tarball.

So, you know that you can save this to buckleup.c, then run cc -o buckleup buckleup.c and then run the resulting ./buckleup executable, which will exit with the code 54, which you can verify with echo $?. But that doesn't tell you how that happens.

Buckle up, Dorothy, and let's tumble down the rabbit hole.

Examining the executable

First of all, what's in that file? You dig a little bit, find out that it's a file of the ELF format, around which there's a whole lot of tooling to learn about. Ok, you find objdump which looks interesting. You read the man page. Ah! disassemble. That's what you want. This is what you get if you're on an amd64 machine:

$ objdump -d buckleup
0000000000001129 <foo>:
    1129:       55                      push   %rbp
    112a:       48 89 e5                mov    %rsp,%rbp
    112d:       89 7d fc                mov    %edi,-0x4(%rbp)
    1130:       89 75 f8                mov    %esi,-0x8(%rbp)
    1133:       8b 55 fc                mov    -0x4(%rbp),%edx
    1136:       8b 45 f8                mov    -0x8(%rbp),%eax
    1139:       01 d0                   add    %edx,%eax
    113b:       5d                      pop    %rbp
    113c:       c3                      ret
000000000000113d <main>:
    113d:       55                      push   %rbp
    113e:       48 89 e5                mov    %rsp,%rbp
    1141:       be 0c 00 00 00          mov    $0xc,%esi
    1146:       bf 2a 00 00 00          mov    $0x2a,%edi
    114b:       e8 d9 ff ff ff          call   1129 <foo>
    1150:       5d                      pop    %rbp
    1151:       c3                      ret

As you can guess, this is assembler code. This one is for the amd64 architecture, using the GNU assembler syntax. Intimidating isn't it? Relax, the vast majority of it is just for argument mangling and stack frame management.

Each line of the listing is an instruction. The left column is a byte offset, the middle one contains the bytes composing the instruction. The right column is the decoded instruction.

The actual juicy parts are actually self explanatory: It's the line at offset 0x1139, which adds a and b, which were copied to registers eax and edx, with the result of the addition being stored into eax which is the "result" register under the System V AMD64 ABI calling convention that Linux follows.

The other juicy part is the call to foo at offset 0x114b, just after having placed our 2 constants in the esi and edi registers as arguments (again, by calling convention).

Confusing isn't it? Yeah, it is, but the good news is that this confusion is all made up. Forget about this and let's tumble down the rabbit hole further. What about assembling our own buckleup executable in assembler? This would help us understand this listing better.

Assembler crash course

Let's begin our dive into i3861 assembly by installing an assembler that has a more pleasant syntax than GNU Assembler, NASM2. A noop program would look like this:

section .text
    global _start
    mov eax, 1  ; exit
    int 0x80

You can assemble this with “nasm -f elf64 noop.asm && ld -o noop noop.o”. The resulting noop executable will efficiently do nothing and exit. How to read this? The first 3 lines are boilerplate. The linker3 (ld) looks at a global symbol named _start to set as the ELF entry point. The _start: line is the definition of a label, which we can see as function names.

The last 2 lines go together. “mov eax, 1”4 copies the constant ("immediate" in assembler talk) value 1 (the exit syscall ID) into register EAX, the 32-bit part of the register RAX5 and “int 0x80” triggers interrupt6 0x80, which will trigger the kernel to kill the process.

Again, this is all made up conventions. There's nothing that fundamentally forces us to this boilerplate to do computing on a x86-64 CPU, only Linux and ELF specifications. But for the sake of having something working right now, let's have our assembler buckleup executable:

section .text
    global _start
    add ebx, eax
    mov eax, 42
    mov ebx, 12
    call foo
    mov eax, 1  ; exit
    int 0x80

You can see that the boilerplate part stays the same. See how it's not plagued by crappy mangling fluff around it and that registers can be nicely used like variables? Yeah, it can be as simple as this. Try it, you’ll see that it correctly returns 54 as the error code.

Let's look at the new faces. In NASM syntax, destination comes first, so “add ebx, eax” means "add ebx and eax and store result in ebx". ret and call go together. call jumps to a label while storing the return address in the Hardware Stack and ret pops an address from the Hardware Stack and jumps to it. This works pretty much like functions in your run-of-the-mill language.

One more mystery remains: what happens to the result of calling "foo"? Why is the result of that add auto-magically ending up as the program's status code? Because ebx is the register assigned to the status code's argument for the exit() syscall. Why? Again, calling convention. When you look at man 2 exit, you see that the call has one argument, the status code. The calling convention states that the first argument of a function call goes in ebx.

With everything cleared up, let's look at our new ELF dump:

$ objdump -d buckleup

buckleup:     file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <foo>:
  401000:       01 c3                   add    %eax,%ebx
  401002:       c3                      ret
0000000000401003 <_start>:
  401003:       b8 2a 00 00 00          mov    $0x2a,%eax
  401008:       bb 0c 00 00 00          mov    $0xc,%ebx
  40100d:       e8 ee ff ff ff          call   401000 <foo>
  401012:       b8 01 00 00 00          mov    $0x1,%eax
  401017:       cd 80                   int    $0x80

Much terser than the C version right?

This little assembler crash course gives us a better understanding of what is compiled by the C compiler, but not how it compiles it. We don't know how it's ran either. To know that, we'd have to dig through software that weights millions of lines of code. Maybe you'd have the patience to do it, I don't, so let's continue tumbling down the rabbit hole. We'll go bare metal and then build an operating system of our own, with a C compiler of our own. It's simpler this way.

Next up

This is the first article of a story arc leading to the creation of a simple OS that has its own C compiler (spoiler alert: it's Dusk OS). I'll try to gloss over the gory details to keep things manageable and fun to the uninitiated, but you'll still have to pick up the pace: it's going to be wild.

Next article: Liberation through bare metal

  1. Yeah, I know, the listing from the C compiler above was in x86-64, but Dusk OS, which we’re going to build together, is 32-bit, so we might as well switch to i386 now. 

  2. Broadly available on most Linux distros under the package name “nasm”. 

  3. What’s a linker? Aw, forget about it, it’s another piece of overcomplicated software that has convinced the world that it’s essential. We won’t need one in what’s coming. 

  4. The “1” is a hardcoded number for the “exit” system call ID. If you search for “SYS_exit” in /usr/include, you’ll get to it. 

  5. The main registers in x86-64 are RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP. They're 64-bit register that all have pretty much the same capabilities, with RSP playing a special role because it's the hardware Stack Pointer, about which we'll talk later. 

  6. We’ll talk about interrupts later. For now, it can be seen as a simple “call system function” instruction.