It's turtles all the way down: from high-level programming to CPU microcode

It’s turtles all the way down:

The following anecdote is told of William James. […] After a lecture on cosmology and the structure of the solar system, James was accosted by a little old lady.

“Your theory that the sun is the centre of the solar system, and the earth is a ball which rotates around it has a very convincing ring to it, Mr. James, but it’s wrong. I’ve got a better theory,” said the little old lady.

“And what is that, madam?” inquired James politely.

“That we live on a crust of earth which is on the back of a giant turtle.”

Not wishing to demolish this absurd little theory by bringing to bear the masses of scientific evidence he had at his command, James decided to gently dissuade his opponent by making her see some of the inadequacies of her position.

“If your theory is correct, madam,” he asked, “what does this turtle stand on?”

“You’re a very clever man, Mr. James, and that’s a very good question,” replied the little old lady, “but I have an answer to it. And it’s this: The first turtle stands on the back of a second, far larger, turtle, who stands directly under him.”

“But what does this second turtle stand on?” persisted James patiently.

To this, the little old lady crowed triumphantly,

“It’s no use, Mr. James—it’s turtles all the way down.”

— J. R. Ross, Constraints on Variables in Syntax, 1967

As a teenager, what caught my interest in programming was the question: wait, but how does it work? It started – as is probably very common – with a game: a browser-based online game called Galactica. One day I discovered the View Source option of the browser, and the rest, as they say, is history.

That was how I started out with web development, using PHP, HTML, and just enough JavaScript and CSS to think that I Know Stuff. Then I switched to harder stuff: C# was my next main language. Shortly after (you know, when the rush just isn’t the same anymore, so you need heavier stuff) I tried picking up C++. Learning C++ only by doing is very hard though: it is a supremely beginner-unfriendly language, so I didn’t get very far with it. It was only during my Computer Science Bachelor’s studies that I received more formal education in C++, and it became my favorite language: it allows you to go as low-level or high-level as you want. You want to take a random integer, convert it to a memory address, and write to it? Sure thing, buddy! ¹ You’d like to have a hash map mapping strings to callback functions defined for example as lambda expressions? You’ve got it! ²

I should mention that I also have a fondness for C, but it’s too cumbersome to work with it for any nontrivial project. Who wants to have to re-implement even trivial mechanisms like lists (or vectors, as C++ calls them) or strings? There’s nothing really appealing about C, in my opinion, compared to C++, if you have the choice; of course, many micro-controllers and more exotic architectures only have working C compilers and toolchains.

For most practical purposes, there’s just no reason to go lower level than C or C++, but of course, you can if you’re determined: you can write Assembly. (C and C++ both support inline assembly, but that doesn’t work for everything.) Some things you can only do in Assembly, usually things that are the responsibilities of the operating system kernel: configuring the CPU is one of them, context-switching between threads is another (since you need to switch stacks by overwriting %rsp on x86-64). But yeah, if C is often an inconvenient language to work in, Assembly is just torture, even with modern assemblers having added many convenience features. Generally speaking, you’d write just a file or two in Assembly, containing the functionality you need, and call it from C (or C++ via extern "C").

There’s yet more turtles though. Assembly compiles down into machine code, which really is just highly unpleasant to even read, especially after the compiler applied optimizations to it (something that the C and C++ compilers are notoriously good at). It’s a common idea that the CPU then executes the machine code, but this has not been true for a very long time: even machine code is just a “high-level” language for the CPU, except one that the CPU knows how to compile itself into microcode. The primary reason for this is the usual one: seeking extra performance.

Indeed, the most common motivations for anybody choosing to work in a language lower-level than something than C# or Java is the need to more closely control what the hardware does, often in order to ensure that it is fast. Game development and high-frequency trading are two good examples. But even working in C++ and doing clever things with memory only gets you so far: compiler optimizations will invoke Cthulhu and sacrifice virgin goats to turn your code into an unspeakable horror-show in the name of the performance. This involves doing things you’d never want to do by hand, such as unrolling a loop, or creating multiple versions of the same code specialized under different run-time conditions.³

Applying optimizations on code written in unrestrictive languages such as C or C++ is no easy task though. These are the languages where the programmer is allowed to manually manipulate pointers or define (raw, untagged) unions. For optimizations to take place, the compiler must be able to make assumptions about what your code does: in general, the better the compiler understands what you are trying to do, the better and faster code it will be able to generate for you to achieve it. For these assumptions to be guaranteed to hold, you need rules, and therein lies the fundamental tension between the “you are the boss” philosophy of the language and the need to make rules so that the compiler can optimize your code. (This is also, sort of, the reason why code written in e.g. C# or Java can sometimes outperform common C++ code: those languages are a lot more restrictive, allowing for less fishy business, and thereby allowing the compiler and their runtime environment to have a better understanding and therefore generate faster code.)

This same idea applies a lot more broadly than just programming languages and compilers: if you’ve ever written a highly performance-sensitive code as a library, you’ve felt the struggle. On the one hand, you want your library to be easy to use, without the user having to understand your implementation details, but you also want to make sure that the performance stays optimal, which almost inevitably leads to implementation details leaking out of your elegant API: you need the user code to tell you what it wants to achieve in details that the user code fundamentally shouldn’t really have to particularly care about but is forced to due to performance considerations. (A simple example is the size of an internal buffer to use.)

But, to get back to our stack of turtles, this is also why even machine code needs to be compiled: machine code often gets new and more complex or more specialized instructions to execute, such as instructions prefixed with rep that allow an operation to be repeated more efficiently than writing a loop. These complex instructions need to be broken down into more fundamental, low-level instructions that is practical to build hardware circuitry for, but which may be specific even to particular CPU models. The opposite also happens: sometimes multiple machine code instructions are actually compiled into a single microcode instruction, called macro-fusion.

Microcode by the way can receive updates, when CPU vendors need to fix or work around bugs in the hardware (I know right), or to fix or mitigate security vulnerabilities. There have been many examples for the latter in the last 5 years, Zenbleed (Hacker News thread) being the most recent one. Updates live in volatile memory (usually the RAM), and so it must be loaded on every boot, either by BIOS/UEFI, or in some cases, by the OS kernel directly.

Do we have an bigger turtle that underlies this microcode? We definitely do have at least ⁴ one more: they are called logic gates, the fundamental primitives that ultimately underlie all instruction execution. Logic gates usually operate on just individual bits: a 0 (logical false) or 1 (logical true). They are very simple beasts, and there are not that many of them. The simplest ones are: not (flips a bit from 0 to 1 or from 1 to 0), and (yields 1 if both input bits are 1, else 0), and or (yields 1 if either input bits are 1, else 0).

Of course, caveats apply, mostly revolving around undefined behaviour. The compiler needs to have a good understanding of what your code is trying to do in order to be able to properly optimize it. If you violate its assumptions, bad things happen: if you’re lucky, your application will just crash. If you’re not lucky, you get impossible-to-track down, strange, once-in-a-while odd behaviour in a completely unrelated part of the program. ↩︎
The C++11, 17, and 20 standards have added lots of convenience features to the language and standard library alike. C++03 and earlier are just sadness: if you are forced to work with them, my condolences! ↩︎
One example of this is loop unswitching where a loop containing a branch whose condition is invariant in a loop is pulled out of the loop, turning while (true) { if (flag) do_a(); else do_b(); } into if (flag) { while (true) do_a(); } else { while (true) do_b(); } if it can be proven that the value of flag cannot be affected by either do_a() or do_b(). ↩︎
In between microcode and logic gates there are definitely some more layers of turtles, which I don’t have a good idea about: it’s the realm of hardware implementation. ↩︎