Pwn/Molten walk
Introduction
Modern CPUs are fast because they’re impatient.
Instead of waiting to see if an instruction is allowed, they go ahead and execute it speculatively and deal with the consequences later. Most of the time this is great for performance. In this challenge, it’s great for us.
We exploit that impatience to read kernel memory from a user process. The CPU eventually realizes something went wrong and throws a segmentation fault… but by then, the secret has already left traces in the cache.
High-Level Strategy
At a high level, the attack looks like this:
- Ask the kernel (via
ioctl) to point to a sensitive address - Trigger speculative execution that reads from that address
- Encode the leaked byte into cache state
- Recover the byte via timing measurements
- Repeat… a lot
- Reconstruct page tables and walk them manually
- Land on the flag and dump it byte-by-byte
It’s less “read memory” and more “interrogate the cache until it confesses.”
The Speculative Gadget
The heart of the exploit is this function:
1
void speculative_exploit(size_t target_addr, char *com_buffer)
Inside it, we deliberately cause a fault:
1
mov rax, [rax] // invalid access → SIGSEGV
Normally, execution would stop here. But the CPU has already speculatively executed the next few instructions:
1
2
3
4
mov cl, BYTE PTR [target_addr]
shl rcx, 12
add rbx, rcx
mov rbx, [rbx]
What this does:
- Reads a secret byte from
target_addr - Uses it as an index into a large buffer (
buffer + value * 4096) - Touches that memory location
This access pulls exactly one cache line into L1. That’s our signal.
“It Crashed” — Good
Every attempt ends in a segmentation fault. That’s expected.
We handle it like this:
1
2
3
if (!setjmp(buf)) {
speculative_exploit(addr, buffer);
}
And the handler:
1
longjmp(buf, 1);
So the flow becomes:
- Try the exploit
- Crash
- Jump back
- Repeat
It’s basically controlled chaos.
Turning Cache Timing into Data
Now comes the sneaky part.
Before each attempt, we flush the entire buffer from the cache:
1
_mm_clflush(addr);
After speculative execution, exactly one cache line might be loaded—the one corresponding to the secret byte.
We probe each possible line:
1
time_access_no_flush(addr);
If access is fast → cache hit → that index is likely the secret.
We repeat this ~100 times and keep a score:
1
stats[mix_i]++;
Finally, we pick the most frequent candidate. Not perfect, but statistically reliable.
Getting the Kernel to Cooperate
The challenge provides a helpful interface:
1
ioctl(fd, 1337, target_addr);
This sets up the kernel module so that our speculative gadget reads from target_addr in kernel space.
We don’t directly dereference it—because we can’t—but speculation does it for us.
Leaking Important Anchors
We first leak:
mm_structpgd(Page Global Directory)
These give us the starting point for walking the process’s page tables.
We read them byte-by-byte using the same side-channel primitive. It’s slow, but once you trust your primitive, everything becomes “just repetition.”
Manual Page Table Walk
Now we do what the MMU normally does for us.
We compute indices:
1
(addr >> (39 - 9*i)) & 0x1ff
Then traverse:
- PGD
- PUD
- PMD
- PTE
At each step:
- Leak the entry using Meltdown
- Extract the physical page
- Move to the next level
This part feels like solving a puzzle where every piece is hidden behind a side-channel.
If you want to know exactly how the page entries are stored by the CPU, I highly recommend Zolutal’s blog.
Translating Entries
Each entry contains flags and the physical address. We extract the page like this:
1
entry & ~0xfff
Then rebuild a usable address and continue the walk.
Eventually, we land on the page that contains our target buffer (and the flag nearby).
Extracting the Flag
With the final address in hand:
1
final_addr = page_translation(entry[3]) + 0x60;
We leak it byte-by-byte:
1
for (int i = 0; i < 61; i++)
Each byte is sampled multiple times and decided via majority vote:
1
counter[temp]++;
This helps smooth out noise and misreads.
Dealing with Noise
Reality is messy:
- Some values (especially
0x00) are unreliable - Cache timings fluctuate
- Occasionally you just get garbage
To deal with this:
- Retry failed reads
- Use multiple samples per byte
- Pick the most frequent result
It’s less like reading memory and more like polling an unreliable witness.
Why This Works
The key insight is:
The CPU enforces permissions architecturally, but not always microarchitecturally.
Even though the illegal access is rolled back, the cache state isn’t. That side-effect becomes a covert channel.
We never bypassed permissions directly—we just observed the aftermath of a mistake.
Final Thoughts
This challenge is a perfect example of how:
- Hardware optimizations can undermine security
- Side channels turn “impossible” reads into possible ones
- You don’t need direct access if you can measure indirect effects
Also, it’s a good reminder that sometimes the best way to get a secret is not to ask for it… but to watch what happens when someone else accidentally looks at it.