A photo of Maximilian Schwarzmüller

Maximilian Schwarzmüller

What Are "Machine Code" & "Byte Code" Anyways?

What Are "Machine Code" & "Byte Code" Anyways?

Published

Sparked by TypeScript being ported to Go I wondered how many people actually know what “machine code” (the “thing” Go code gets compiled to) really is.

Because I’m happy to admit that for a long time, well into my 20s, I would simply accept that “compiling to machine code” is great without really knowing what that “machine code” really is.

Is it something like this?

10010001011
00101010011
00100011111
01010010101

Or like this?

MOV EAX, 42
PUSH EAX
CALL _function
ADD ESP, 4
CMP EAX, 0
JNE _label
XOR EDX, EDX
RET

Or something totally else?

And what is “byte code”? The same? Something else?

Which Code Do Computers Actually Execute?

Computers only “understand” machine code. It’s their native language, and it’s made up of binary instructions—just sequences of zeros and ones—that the CPU can directly execute. So the random binary numbers shown above are machine code.

Though, the example above is really just me hacking some random 0s and 1s into this article. So that exact code wouldn’t get the computer to do anything useful.

Of course, we humans don’t really write machine code directly because it’s incredibly difficult and error-prone. Instead, we use languages closer to human-readable text, like assembly or higher-level languages like C, Go, or Rust, which then get translated down to machine code.

Machine Code

Machine code consists of direct binary instructions executed by the processor. Each processor family (Intel x86, ARM, etc.) has its own specific machine language. A snippet of human-readable assembly code like this:

MOV EAX, 42
PUSH EAX
CALL _function
ADD ESP, 4
CMP EAX, 0
JNE _label
XOR EDX, EDX
RET

will be converted (“assembled”) into binary instructions by a tool called an assembler. This assembler translates these readable instructions into machine code, something like:

10111000 00101010 00000000 00000000
01010000
11101000 11001101 11111111 11111111
10000011 11000100 00000100
00111001 11000000
01110101 11110100
00110010 11010010
11000011

This is what your CPU truly runs. Each set of zeros and ones corresponds directly to a physical operation inside your processor, like “load this number into a register,” “perform addition,” or “jump to another location in memory.”

Since every platform has its own set of instructions the CPU understands, different platforms need different machine code. It’s always 0s and 1s but it’s not always the same sequences of 0s and 1s, you could say.

That’s why, for example when using Go, you might want to specify the platform you’re compiling for - to produce the appropriate machine code for that platform:

GOOS=linux GOARCH=amd64 go build .

Byte Code

Byte code, on the other hand, isn’t directly executed by your CPU. It’s a middle-ground representation that sits somewhere between human-readable code and machine code. Languages like Java or Python are compiled into byte code, not directly into machine code.

When you write Java, for example:

int answer = 42;
System.out.println(answer);

This doesn’t turn directly into machine code. Instead, it’s compiled into Java byte code that looks like this:

0: bipush 42
2: istore_1
3: getstatic #2
6: iload_1
7: invokevirtual #3 
10: return

This byte code isn’t directly understood by your CPU. Instead, a Java Virtual Machine (JVM) interprets or just-in-time compiles it into real machine code at runtime, allowing Java to run on any system with a JVM installed—Windows, Linux, macOS—without recompilation.

The advantage of this approach is that your compiler doesn’t need to produce platform-specific machine-code. Instead, it produces bytecode and it’s then the JVM on a specific platform that interprets that code such that it runs on the platform.