[[/html/2025/2025-09-11--touching_small_elfs-p1-broken_tools.html|Part 1: Understanding Small ELFs and Fixing Broken Tools]] [[/html/2025/2025-10-06--touching_small_elfs-p2-segfaults_everywhere.html|Part 2: ELF Magic Gone Wrong: Debugging SEGFAULTs (Examples of ELF Failures)]] Part 3: Debugging Userspace ELF in the Kernel with QEMU Snapshots. Learning objectives: - Understanding a user space bug by debugging kernel. - Semi time-travel debugging in QEMU. - Watching accesses/writes into a physical memory address in QEMU. :@@@@@@@@@@@@@@@: @@@@@@@@@@@@@@@@@@@@@@@@# @@@@@@@@@@@@@@@%@@@@@@@@@@@@@@# @@@@@@@@@@@@@@@@@.@@@@@@@@@@@@@@@@@ @@@@@@%@@@@@@@@@@@.@=@@@@@@@@@@@@@@@@@@ @@@@@@%@@@@@@=@@@@@@.+.@@@@@@.@@@@@@@%@@@@. @@@@@@@@@@.@@=@@@%@@=...:@@@@@@.@%*@@@@@@@@@@ @@=@@@@@@@@@@=@@@%@@@..@..@@@%@@@=@@@@@@@@@@+@@ :@+@#+@@=.@@@..@@@@@@...=.%..=@@@@@@..@@@==@@+.@=@ @@@+=@%=@@#@=@@@.=.:@.#.:=:.:.%....@@@=@+@@=@@==@@@ %%==@@@@+%.=@@.....@@.@:*.+=.:%.@@....=@%=+@=*@@@==@= #=@=@===%@@@@.@@.@@@#.++.=*#.@:=+.@@@@+@@=@@@@#==+@=@#= .@=@@=@@@#%.%*..@@@=@++#:@=*=%:+++@.@@@==*#+%%@@@=@@+%* =@@@*=@@=#@=@@@@@@.@@....*:@.@:+....@@*@@@@@@=@==@@==@@. @@%@@@=.=.@@=@@@@@@*......=::=........@@@@@*=@#=..=@@@#@: @@@@=@@@@@@@@@@@@@..........+..........@@@@@@@@@@%@@=@@@@ @%@@@%@@@@@@@@@@@@@@.......**+......%@@@@@@@@@@@@@@@@@@%@ @@@@@%@@@@%@@@@@@@@@@@@.....*++.....@@@@@@@@@@@@@@@@@@@@@@: @@%@@@@@@@@@@@...............+...............@@@@%@@@@@@@@# %@@@@@@@@@@@...................................@@@@%@@@@@@ ==. +@@%@@@@@@.......................................@@@@@@@@@ .== =====..@@@@@@@.....@@@@@@@.................@@@@@@@.....@@@@@@@..===== =====...=.@@@@.#.@@@ @@@@@...............@@@@@ @@@.#=@@@@.......==+ ====++.....@@..@@: @@@@@@@.............@@@@@@@ @@@.=@@.....++===+ ..=+==+...@@...@ +@@@@@@@@...........@@@@@@@@+ @..=@@...+====== =..===+..@@...@ +@@@@@@@@............@@@@@@@+ :..*@@.++==+..+ +...=+=+@@.... @@@@@@@ ........... @@@@@@@ +...@@@+=+....+ =...+==@@@.... @@@@@ ............ @@@@@ .....@@@==+.... .@...+=@@@..=====: .......=+........ :=====..@@@==...@ #@@=...@@@.========............+........=======.#@@@...=@@ @@@@@.==@@@.=======........=..*.........=======.@@:...@@@@ @@@@@@@=@@@..=====.......................=====.%@@@=@@@@@@ @#@@@@@=@@@@............=.........=............@@@@.@@@@@@# @.@@@@.*@@%@@.............=.===.=.............@@@@.=@@@@@@@ =@:@@@@=*=@%@@@..............................@@@@@@.+@@@@@%@ @%@@%@@@=@@%@@@@@...........................@@@@@@@*=@@@@@.@ @.@@@@@@@@@%@@@@@@@.......................@@@@@@@@@@@@@%@@.@@ #@.@@@@@@@@@%@@@@@@@@@@.................@@@@@@@@@@@@@@@@@@@+=@ @..@@@@@@@@@%@@@@@@@@@@@@@++......=++@@@@@@@@@@@@@@@@@@@@@@@.@: *@.@@%@@@@@@@%@@@@@@%@@@@@@=++++++++==@@@@@@@@@@@@@@@@@@@@@@@..@ @..@@@@@@@@@@@@@%@@@@@@%%%%======+===+%%%%@@@@@@@@@%@@@@@@%@@@.@. @..@@%@@@@@@@@@@@@@@@@%%%%%%==========.%@@%%%@@@@@@@%@@%@@@@@@@.:@ @.:@@@@@@@@@@%@@@@@@@%%%%%@.............@%#%%%@@@@@@@@@@@@@@%@@@.@@ @..@@%@@@@@@@@@@@@###%%%%#%...............%%%%%#%%@@@@%@@@@@@@@@@..@ %*.%@@@@%@@@@@@@@@%%%#%#%%%#%.............#%#%%#%%#%@@@@@@@%@@@%@@@..@ @..@@%@%@@@@@@@%%%#%%%%#%#%%#%...........###%%#%%#%%##%@@@@@@@@@%@@%.#@ @..@@%@@@@@@@@%%%#%##%##%%#%%%%%%.......%%%%#%%#%%%#%%%#%#@@@@@@@@@@@..@= :%.@@@@@%@@@@@@%%%#%#%%%%%#%%#%%%%%.:%:.%%%##%%%%%%%#%###%%%@@@%@@@%@@@..@ @..@@@@%@@@@@@#%%%%%%%%%%%#%%#%%%%.%.:.%.%%%%%%%%#%%%%%%%%%#%@@@@@@@@@@%.:@ .#.@@%@@@@@@@@######%%%%##%%%%%%%%%:%:%.%:%%%%%%%%%#%%%##%%###@@@%@@@@@@@..@. @.@@@@@%@@@@@%%%%#%%%%%%%#%%%##%%%%.%::#%:%%%%%%%%%%%#%%%#%##%#@@@@@@@%@@@..@ @.@@@@@@@@@@@%#%#####%%%%###%##%%%%%+.:.#%%#%%%%##%%#%%#%#+#%#%%@@@@@@@@@@%.@ %@@@@@%@@@@@%#%%##%###%%#%%%%%%##%%%+++=+%%%#%%%#%%%#%%#%####%%%@@@@@@@%@@@.:@ .@@@@@@@%@@@###%%##%+#%%%#####%#%%%.=:=: .%%%#%%%%#%%%%%#%#%%#%%%@@@@%@@@@@@.@ @@@@@%@@@@@%%%%#%%#%###%###%%%#%%%.=:=:+: *%%%%%%###%%##%%#%##%%#@@@@%@@%@@@.@ @@@@@@@@@@@#%%#%#%##%###%%%%%##%%=:+: =+:+:+#%%%%###%+%#%#####%%%#@@@@@@%@@@@@ @@@@@@@@%@@#######%%##%#%#%%%%%%%. ::+:* +:::.#%%%#%%%#%#%%%%#%###%@@@@@@@@@@@@ @@.@@%@@@@@%#%##%%%%###%%%#%%%#%. :::. *:.:: :.###%#%%%%#%###%%#%%##@@@@@@@@@@* .@@#@@%@@@@%#%%%%%###%###%#%%%#*. :::: . :::: :.%%#%##%###%%####%%%#@@@@@@@@.@@ @@@@@@%@@@@###%%%%%#####%%####+.: :::: :.::: : ::.%%%%#%###%#%#%%####%@@@@@@@ @@ @@%@@@@@@@@%###%%#%#%#%%#%=##+.: :: : . : :: : .#%%+%%#%%#%#%##%#%#@@@@@%@ @@ @@+@@@@@@@#%###%%##%%%###*###.: :: ::::.: :: ::::.%##=#%%#%###%#%%#%@@@@@@@ @@ @@=#@@%@@@##%%#%%##%%#%=#%#**: ::: :: ::.::: ::::::.*%%##%#%%#%#%%#####@@@@@@ @@ #@ :@@@@@%%%%#+%####%=*%###*+= :::::: ::.::::: :::+**###%=##%%##%##%%#@@@@@@ @@ @@ @@@@@%%%%%####%=##%%#%*+++::::: ::: . : :: ::++**%%%%#=%#%##%%##%@@@@@@ @@ @@ %@@@@#%#%##%..%#####%%*+++*: : : :. : ::.:::++++#@##%%##==#%%##%##@@@@ #@ @# @@@#%#%##=#%=%#####%%#++++:: .: :: .::::. : :++++%%%%%##%#%%+##%%##@@@@ @@ @ @@%%#%%%%*+%%%%##%%@%%%*++:::.::: .:: .::::+++#%%@@@#%%###=%#%##%#@@@ @@ ** @%%%=##%%#%%#%%@@@@@%%%%%%%%%%%%%%%%%%%%%%%%%%%%@@@*%#%#%#%##=%%%%@ @ ===%%####%####*@@@@@%%%#***#*****#*********#%%%%%@@**#%#%%%%#%%=== %%%%%%#%#####**@%%@@%%%%%#%%#%%%%%%%%##%%#%%%%%@@@%***%%%###%%#%%% .%%%%%%#%%%#%***%@%@@%%%%%##%%##%**%%##%%##%%%%%@@@@****%%#%#%#%%%%# %%%%%%%%####*++++@@%##%%%#%%%#%%%%%%%%%#%%##%%%#@%@++++*#%%#%%#%%%%% %%%%#%%#%%%*++++++#@%%%%******###%%%%####**#*%%%%%++++++*%#%%##%%%%% =%%%%%##%%%%*++++++++%%%%%%%%%%%%%%%%%%%%%%%%%%%%%++++++++*#%%%##%%%%. %%%%%######*++++++++++%%%%@%%@@%%%%%%%%%@%%%@%%%++++++++++*###%%%%%%%% %%%%###%#%**++++++++++++%%%%@%%@%+*+*%%@%%%%%@%+++++++++++**#%#%##%%%% .%%%%%#%#%#%++++++++++++++%%@%%@%%*+++=%%@%%@%=++++++++++++*@%%%#%%%%%% %%%%%%#%##%@@@+++++++++++++%%%@%%=+++++*%%%%%+++++++++++++@@%%%##%#%%%%% %%%%%###%#@%%%%++++++++++..+.%%%*.+++++.*%%+++=++++++++++%%%@%##%###%%%% :%%%%#####%%%%@@%@+++++++.++=++.++.++=++.*+=+=.+.+++++++#@%%%@@@##%%%%%%% %%%%%%%#%%%@%@%@@@#+++++=++.+......=+++=.....++.+=+++++#@%@%@%%%%#%#%#%%%* %%%%%#%##%%%@@%%@####+++.+=+=.................+.+.+++####@%%@%%@#%%%#%%%%% %%%%%#%%%#@@%%@%%######+.+=+.......=..........=+.++*#####%%%@%@@%##%%%%%%% %%%%%%%%%##@@@@@@########*+==.....+...+........++.*########%@@@@%%#%#%%%%%%. %%%%%#%#%#%%%@%%%##########++....+..+..........==####**####@@@@%%@%#%%#%%%%% %%%%%%%%%#%%%%%@#*###****####...=......+......:#*******+##*#%@@%%%%%##%%%%%% %%%%%%%%%#%@%%@@%***#*****#**#..+.+...+...+..+:::***#*+**#**#%%%%%@%###%%%%%%. %%%%%##%%##@%@@%#***#*********:.:+..+...+..++=.::***#****#*#**@@%%@%##%#%%%%%% %%%%%#%%#%#%%%%@*#*#*****#****::.:::+++=+=++..:: *********##**@@@@%%%#%%#%%%%% +%%%%%##%##%%@@@@#**#*********= :::::..:... :::****#****#***%@@@%%###%%%%%%% %%%%%%%%%##@%@%@#**##********* :::.::: .: :. ::****#**+*##**#%@@%@#%%%%%%%%%% %%%%%#%#%%%%@%%%***#**********:::::.: :: : ::: *********#*##%@%@@%%%%#%%%%%% %%%%%%%%##%%%%@%#*##**********: :::. : ::::: : :******+**###*@@%%@%##%#%%%%%% ===[ INTRO ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The delving intensifies! In this article, we'll look at kinda time-travel debugging and how to watch a physical memory address in QEMU. ===[ Time-travel Debugging ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Time-travel debugging is one of those techniques that can exponentially speed up debugging or reversing (especially when working with memory bugs). If you haven't heard about it, time-travel debugging is the process of stepping back in time through code to understand what's happening during a program's execution. [ref1] Picture a binary that crashes with a segmentation fault. When we inspect the core dump, we might see something like SIGSEGV at address 0 and a broken stack. How did it get there when the backtrace is useless? What if we could step a few instructions backward? And not just that. What if we could set a watchpoint on an address, hit ''reverse-continue'', and instantly find the last instruction that touched that memory location? That's the power of time-travel debugging. Pretty nifty, right? But we wont't use it here... ===[ Broken Time Machine ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ QEMU has built-in record/replay functionality [ref2] [ref3] [ref4], which can be used for deterministic replay of binary execution. Unfortunately, as of the time I'm writing this, there's a bug that makes it pretty much unusable: replay operations (like ''reverse-step'') load the nearest snapshot, but QEMU hangs if there's any snapshot other than the initial one [ref5]. That actually makes it slower than re-running the whole emulation (because of icount replay). But there's still one way to time-travel with QEMU. Snapshots, as they are, are actually a very useful feature since they save the full state of the running emulation (= vCPU, RAM, and devices) and, most importantly, preserve breakpoints when a snapshot is loaded. They're far from being a full replacement for record/replay functionality, and we can definitely say goodbye to determinism. But they're still an excellent tool for debugging, like we're doing here. Since snapshotting is available in QEMU, we can run a kernel emulation until we hit ''load_elf_binary'', take a snapshot using ''savevm'', and then continue debugging. If we hit a dead end or find an interesting address (or error), we can set a breakpoint or watchpoint, load the snapshot using ''loadvm'', and keep going until we hit that breakpoint or watchpoint. On top of that, we can call QEMU ''monitor'' commands, such as ''savevm'', directly from GDB [ref6] [ref7], as we'll see later in the article. (Also, see the notes on time-travel debuggers in [[#OUTRO]].) ===[ BUG #4: Chef's Kiss ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's continue the BUGs series from the previous article [ref8] and take a masterclass in debugging. And yes, it's RAW! [ref9] The source code is available either in [ref8] or here: [[/data/2025/elf64-fixme.nasm|elf64-fixme.nasm]]. $ ./elf64-fixme Segmentation fault (core dumped) This segfault is a real specialty. When we look at ''strace'', it looks similar to ''BUG #1'' [ref8], where the code jumped out of the mapped memory region: $ strace ./elf64-fixme execve("./elf64-fixme", ["./elf64-fixme"], 0x7ffe953f0e70 /* 42 vars */) = 0 read(0, NULL, 0) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x1} --- +++ killed by SIGSEGV (core dumped) +++ Segmentation fault (core dumped) But this one is different. This time, there's no bug in our user space code causing the memory violation. Let's inspect it closely in GDB: $ gdb -ex 'file ./execve_wrapper' -ex 'catch exec' -ex 'run ./elf64-fixme' (gdb) x/20i $rip => 0x12eb00000001: rex.RB 0x12eb00000002: rex.WR 0x12eb00000003: rex.RX syscall 0x12eb00000006: mov cl,0x48 0x12eb00000008: push rcx 0x12eb00000009: nop 0x12eb0000000a: nop 0x12eb0000000b: nop 0x12eb0000000c: nop 0x12eb0000000d: nop 0x12eb0000000e: nop 0x12eb0000000f: add eax,0x3e0002 0x12eb00000014: pop rsi 0x12eb00000015: mov dl,0xe 0x12eb00000017: mov eax,0x1 0x12eb0000001c: add BYTE PTR [rax],al ; <-- 00 00 0x12eb0000001e: add BYTE PTR [rax],al ; <-- 00 00 0x12eb00000020: add BYTE PTR [rax],al ; <-- 00 00 0x12eb00000022: add BYTE PTR [rax],al ; <-- 00 00 0x12eb00000024: add BYTE PTR [rax],al ; <-- 00 00 Did you notice something? Our code is missing starting at ''0x12eb0000001c''. That line should contain ''jmp short 0x30'', but instead there are zeros (''add BYTE PTR [rax],al'' is ''00 00'' in hex). What's happening? It shouldn't be a broken mapping, since mappings work at page granularity. Did the kernel stop copying the full code for some reason? Or is something zeroing it out? Let's check the hexdump, because it looks cool: (gdb) x/80xb 0x12eb00000000 0x12eb00000000: 0x7f 0x45 0x4c 0x46 0x0f 0x05 0xb1 0x48 0x12eb00000008: 0x51 0x90 0x90 0x90 0x90 0x90 0x90 0x05 0x12eb00000010: 0x02 0x00 0x3e 0x00 0x5e 0xb2 0x0e 0xb8 0x12eb00000018: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ^-- it might start here => offset 0x19 0x12eb00000020: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x12eb00000028: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x12eb00000030: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x12eb00000038: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x12eb00000040: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x12eb00000048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 What the hexdump tells us is that the zeroing might start at ''0x12eb00000019'', but it's definitely happening at ''0x12eb0000001c''. Therefore, we want to watch the offset ''+0x1c'' to be sure. Also, this isn't something we want to debug from user space, since we have no clear clue which function is responsible. We could theoretically find out using ''ftrace'' and ''kprobes'', but that would be unnecessarily daunting. Let's use QEMU and watch memory accesses instead. ===[ Physical vs. Linear Memory ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start QEMU, what's our strategy? What approach should we take? Stepping through instructions one by one would be too time-consuming. And if we find an interesting function to break on, it might be too late, and we'd have to restart the whole process. We could set a watchpoint on memory access, but there's a catch! Watchpoints track a *virtual* (linear) memory address. That means we won't be able to catch all accesses, because the kernel changes virtual memory mappings based on context. For example, kernel space has a different virtual mapping than a user space program. Here's an extremely simplified description of what happens when an executable binary is loaded into memory on x86-64: 1. After boot, the kernel enables long mode and paging (CR0 is set and the MMU is enabled). From that point on, the CPU operates on virtual (linear) memory addresses. This means addresses are translated from virtual to physical by the MMU. 2. When ''load_elf_binary'' loads parts of the binary into memory and parses them, it builds a new process image based on the ELF headers (= it maps the ''PT_LOAD'' segments of the binary into the process's virtual address space). 3. The binary is effectively stored in physical memory (and has a physical address), but that address differs from the virtual address used by the kernel, which in turn differs from the process's virtual address. So we now have at least three sets of memory addresses: physical memory, kernel working memory, and user space process memory. And all of them will almost certainly differ from each other. (Also, there can be different addresses when memory is shared, but we won't go into that here.) 4. And this is where the fun part starts. Memory mappings are governed by the MMU. When the CPU works with an address, it uses a virtual address, but when the data needs to be stored to or fetched from memory, the CPU delegates the operation to the MMU, which translates the virtual address to the corresponding physical address. This means x86 CPU debug registers ''DR0--DR3'' compare the virtual (linear) address, not the physical address. In summary, all this means that when we set a watchpoint, it triggers only when the CPU reads or writes to the specified *VIRTUAL* address, and that address can differ depending on the process context (= kernel/user). As a result, we won't be notified in all cases! (Also, we shouldn't ignore operations that can bypass the CPU entirely, such as DMA writes to memory.) NOTE: Maybe you've heard of the ''maintenance packet Qqemu.PhyMemMode:1'', so why not use it? Because it only switches reads and writes to physical memory, watchpoints still use virtual (linear) addresses. ===[ Thanks for the Memory ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ That's grim. How can we watch the data of interest when the virtual address can change during execution? Well, actually there's one dirty trick to watch a physical address in an emulator. QEMU has a command that translates a physical memory address of the emulated program to the virtual memory address of the running QEMU process on the host: ''gpa2hva'' (guest physical address to host virtual address). You can probably see where this is going -- we'll translate any physical addresses we're interested in to their corresponding host addresses and then set a watchpoint in QEMU's host virtual address space. LINUX KERNEL <----- VIRTUAL HW <----- QEMU <----- HOST HW (GDB STUB) ^ ^ | | '--------- gdb-host gdb-guest Don't worry about that for now. Let's start small by creating emulated physical memory that we can easily access. Start QEMU with ''memory-backend-file'', attach GDB (we'll call it gdb-guest ), and set a breakpoint on ''load_elf_binary'' (see part 1 for the full QEMU + GDB setup [ref10]): $ echo 'elf64-fixme' | cpio --quiet -H newc -o | gzip -5 -n > ./initrd.gz $ qemu-img create -f qcow2 snap.qcow2 1G $ qemu-system-x86_64 -accel tcg -smp 1 -S -monitor stdio \ -object memory-backend-file,id=mem,size=512M,mem-path=/dev/shm/mem,share=on \ -machine memory-backend=mem -gdb tcp:127.0.0.1:1234 \ -kernel vmlinuz-6.1.0-35-amd64 -initrd initrd.gz \ -append 'nopti nokaslr console=tty0 console=ttyS0,115200 rdinit=/elf64-fixme' $ gdb -ex 'file /temp/elf/vmlinux-6.1.0-35-amd64' \ -ex 'target remote 127.0.0.1:1234' \ -ex 'hbreak load_elf_binary' \ -ex c NOTE: When using QEMU snapshots in a diskless virtual machine, we need to add a qcow2 storage device [ref2] to store the snapshots (unless we want to use file-based migration [ref11], which I don't). NOTE: When booting the Linux kernel, it can sometimes be useful to make it more deterministic (at least for initial debugging) by disabling randomization (like KASLR -- Kernel Address Space Layout Randomization [ref12]) and mitigations (like PTI -- Page Table Isolation [ref13]). In this case, we won't need it much, since we'll be debugging the kernel part close to the user space program. (But it won't hurt.) Wait for the breakpoint to trigger in gdb-guest, then locate all instances of the binary in the emulated kernel's physical memory. We can use ''rafind2'' from the radare2 toolkit [ref14]: $ i="$(xxd -ps elf64-fixme)" $ rafind2 -x "$i" /dev/shm/mem 0x3305000 0x6000ca0 The binary appears at two physical locations, and we need to watch both since we don't yet know which one is being copied or shredded. We could translate both addresses into virtual ones by adding ''page_offset_base'' (= the direct mapping of all physical memory [ref15]) and then watch those addresses. But this runs into the same problem with linear addresses we discussed in [[#Physical vs. Linear Memory]]. What we really want is to watch access to the physical addresses. ===[ From Physical Guest to Virtual Host ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It's time for us to implement the "dirty trick" we talked about at the beginning of [[#Thanks for the Memory]]. First, we need a way to translate the physical addresses of the emulated kernel into the virtual addresses of the QEMU process. Luckily, we can do that with the QEMU monitor command ''gpa2hva'' [ref16]: (qemu) gpa2hva 0x3305000 Host virtual address for 0x3305000 (mem) is 0x7ff787304000 (qemu) gpa2hva 0x6000ca0 Host virtual address for 0x6000ca0 (mem) is 0x7ff789fffca0 Now we can start a new GDB instance and attach it to the QEMU process (we'll call it "gdb-host"). Then, set watchpoints on the addresses ''0x7ff787304000'' and ''0x7ff789fffca0''. Don't forget to add the offset ''0x1c'', since the "zeroing" starts roughly there. $ gdb -p "$(pgrep qemu)" -n -q (gdb-host) awatch -l *(0x1c + 0x7ff787304000) Hardware access (read/write) watchpoint 1: *0x7ff787304000 (gdb-host) awatch -l *(0x1c + 0x7ff789fffca0) Hardware access (read/write) watchpoint 2: *0x7ff789fffca0 (gdb-host) c NOTE: Watch out for any special configuration in gdb-host. For example, the ''dashboard'' plugin [ref17] can cause an infinite loop in GDB! ===[ We Have to Go Derper ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Remember: when doing weird things, don't be afraid to go all the way. What if I told you that QEMU has a "stop the emulation" function called ''vm_stop()'' [ref18], and that GDB can inject itself into the debuggee process to invoke its functions? What would you do? int vm_stop(RunState state) We won't go into the ''RunState'' data type here. Let's just say we can wake up gdb-guest by calling the function with the argument ''0'' (= ''RUN_STATE_DEBUG'') [ref19]. (gdb-host) Thread 3 "qemu-system-x86" hit Hardware access (read/write) watchpoint 1: -location *(0x1c + 0x7ff787304000) (gdb-host) call (int) vm_stop(0) (gdb-host) c Calling ''vm_stop(0)'' notifies the debugger connected to QEMU's GDB stub (gdb-guest) with a ''SIGTRAP'' signal. When gdb-guest receives the signal, it pauses, allowing us to collect the stacktrace, registers, and other data: (gdb-guest) Program received signal SIGTRAP, Trace/breakpoint trap. ===[ Innit, automate ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Noice! ''vm_stop(0)'' works as expected, but every access to those addresses (even simple memory inspections in gdb-guest) halts gdb-host. For each such access, we have to switch from gdb-guest to gdb-host and hit ''continue''. That's tedious, but fortunately, we can easily automate ''vm_stop'' in gdb-host: (gdb-host) Thread 3 "qemu-system-x86" hit Hardware access (read/write) watchpoint 1: -location *(0x1c + 0x7ff787304000) (gdb-host) commands 1 2 call (int) vm_stop(0) c end (gdb-host) c ''commands 1 2'' means: execute the following commands for breakpoints or watchpoints numbered ''1'' and ''2'' (= the ones we defined earlier). After hitting ''continue'' in gdb-host, we also need to run ''continue'' in gdb-guest. But before that, save a VM snapshot -- it will come in handy later: (gdb-guest) monitor savevm load_elf_binary (gdb-guest) c The moment we run ''continue'' in gdb-guest, every time anything in the emulation accesses those addresses, we get notified in gdb-host. However, that's not very useful, since we only see internal QEMU functions, not the guest kernel we're debugging. We need a way to notify gdb-guest, where the actual debugging happens. Now we can step through gdb-guest uninterrupted until we find a new address to watch. ===[ Red Herring ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Time to focus on what we should be focusing on. If we lose sight of our goal, we'll be burning time on unnecessary side quests. Let's see an example... Our first stop is the ''memcpy'' function: (gdb-guest) bt 1 #0 memcpy () at arch/x86/lib/memcpy_64.S:40 (gdb-guest) disassemble Dump of assembler code for function memcpy: ... 0xffffffff81a36d82 <+18>: rep movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi] => 0xffffffff81a36d85 <+21>: mov ecx,edx ... (gdb-guest) info registers rdi rsi rdi 0xffff888005fffeb8 rsi 0xffff888003305050 ''REP'' repeats a string instruction (e.g., ''movs'') until ''RCX'' reaches ''0''. Each iteration decrements ''RCX'' and increments or decrements (depending on the direction flag ''DF'') the values in ''RDI'' and/or ''RSI'' by the element size (''QWORD'' -- 8 bytes) [ref20]. In this case, it copies 8-byte chunks from ''0xffff888003305050 - size'' to ''0xffff888005fffeb8 - size''. The source address ''0xffff888003305050'' corresponds to the physical address ''0x3305000'', after subtracting the number of bytes already copied and the page offset: src_phys = = RSI - page_offset_base - size = = 0xffff888003305050 - 0xffff888000000000 - 0x50 = = 0x3305000 So what is copied by the ''memcpy'' function? (gdb-guest) x/80xb $rdi - 0x50 0xffff888005ffefe8: 0xf1 0x2c 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888005ffeff0: 0x52 0x00 0x00 0x81 0x00 0x00 0x00 0x00 0xffff888005ffeff8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888005fff000: 0x00 0x00 0x00 0x00 0xeb 0x12 0x00 0x00 <-- 0xffff888005fff008: 0x18 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888005fff010: 0x18 0x00 0x00 0x00 0xeb 0x12 0x00 0x00 0xffff888005fff018: 0x0f 0x05 0xb0 0x3c 0x0f 0x05 0x38 0x00 0xffff888005fff020: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888005fff028: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888005fff030: 0x57 0x4f 0x52 0x4b 0x49 0x4e 0x47 0x0a (gdb-guest) x/80xb $rsi - 0x50 0xffff888003305000: 0x7f 0x45 0x4c 0x46 0x0f 0x05 0xb1 0x48 0xffff888003305008: 0x51 0x90 0x90 0x90 0x90 0x90 0x90 0x05 0xffff888003305010: 0x02 0x00 0x3e 0x00 0x5e 0xb2 0x0e 0xb8 0xffff888003305018: 0x00 0x00 0x00 0x00 0xeb 0x12 0x00 0x00 <-- 0xffff888003305020: 0x18 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888003305028: 0x18 0x00 0x00 0x00 0xeb 0x12 0x00 0x00 0xffff888003305030: 0x0f 0x05 0xb0 0x3c 0x0f 0x05 0x38 0x00 0xffff888003305038: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888003305040: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888003305048: 0x57 0x4f 0x52 0x4b 0x49 0x4e 0x47 0x0a Even though it looks suspicious and seems worth investigating, it's a trap. Just look at it -- no zeros we're hunting for. It's simply copying the ELF program headers into a buffer. Here's the backtrace: (gdb-guest) bt #0 memcpy #1 _copy_to_iter #2 copy_page_to_iter #3 shmem_file_read_iter #4 __kernel_read #5 kernel_read #6 elf_read #7 load_elf_phdrs <--- #8 load_elf_binary ... It might be related, but first let's find the zeroing. We can return to this later since we have the snapshot. ===[ No BRK for You Tonight ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Leaving ''memcpy'' behind, what's next? (gdb-guest) c Program received signal SIGTRAP, Trace/breakpoint trap. set_brk (start=start@entry=20800526614553, end=end@entry=20800526614554, prot=prot@entry=6) at fs/binfmt_elf.c:113 Now it's getting interesting. A watchpoint triggered inside ''set_brk'' [ref21], and its arguments look a bit strange: (gdb-guest) info registers rdi rsi rdx rdi 0x12eb00000019 ; start rsi 0x12eb0000001a ; end rdx 0x6 ; prot Could the "program break" (brk [ref22]) be mangling our data? Frankly, I doubt it. In extreme cases, ''brk'' should only allocate or deallocate heap memory. I'd be surprised if it triggered zeroing in the kernel. Next please: (gdb-guest) c Program received signal SIGTRAP, Trace/breakpoint trap. copy_page () at arch/x86/lib/copy_page_64.S:20 ===[ Prepare Your Diddly Hole ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ''copy_page'' sounds like the kernel is preparing our binary for execution: (gdb-guest) disassemble Dump of assembler code for function copy_page: 0xffffffff819e4d60 <+0>: xchg ax,ax 0xffffffff819e4d62 <+2>: mov ecx,0x200 0xffffffff819e4d67 <+7>: rep movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi] (gdb-guest) x/80xb $rsi-(0x200*8) 0xffff888003305000: 0x7f 0x45 0x4c 0x46 0x0f 0x05 0xb1 0x48 0xffff888003305008: 0x51 0x90 0x90 0x90 0x90 0x90 0x90 0x05 0xffff888003305010: 0x02 0x00 0x3e 0x00 0x5e 0xb2 0x0e 0xb8 0xffff888003305018: 0x01 0x00 0x00 0x00 0xeb 0x12 0x00 0x00 0xffff888003305020: 0x18 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888003305028: 0x18 0x00 0x00 0x00 0xeb 0x12 0x00 0x00 0xffff888003305030: 0x0f 0x05 0xb0 0x3c 0x0f 0x05 0x38 0x00 0xffff888003305038: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888003305040: 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff888003305048: 0x57 0x4f 0x52 0x4b 0x49 0x4e 0x47 0x0a (gdb-guest) x/80xb $rdi-(0x200*8) 0xffff8880029fe000: 0x7f 0x45 0x4c 0x46 0x0f 0x05 0xb1 0x48 0xffff8880029fe008: 0x51 0x90 0x90 0x90 0x90 0x90 0x90 0x05 0xffff8880029fe010: 0x02 0x00 0x3e 0x00 0x5e 0xb2 0x0e 0xb8 0xffff8880029fe018: 0x01 0x00 0x00 0x00 0xeb 0x12 0x00 0x00 0xffff8880029fe020: 0x18 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff8880029fe028: 0x18 0x00 0x00 0x00 0xeb 0x12 0x00 0x00 0xffff8880029fe030: 0x0f 0x05 0xb0 0x3c 0x0f 0x05 0x38 0x00 0xffff8880029fe038: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff8880029fe040: 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff8880029fe048: 0x57 0x4f 0x52 0x4b 0x49 0x4e 0x47 0x0a The binary is being copied correctly, one-to-one. NOTE: ''copy_page_64()'' copies ''0x200'' QWORDs, which equals one 4 KiB page (''0x200 * 8 = 0x1000 = 4096''; who would have guessed). The breakpoint stopped at the following instruction, so the copying is complete. Therefore, ''rsi'' and ''rdi'' now point one page past the region of interest. That means, we need to subtract the page size (''0x1000'') from the addresses. Do we have a new physical address? (gdb-guest) p/x $rsi - (0x200 * 8) - page_offset_base $1 = 0x3305000 (gdb-guest) p/x $rdi - (0x200 * 8) - page_offset_base $2 = 0x29fe000 Yes, we do: ''0x29fe000''. NOTE: We could also use the QEMU function ''gva2gpa'': (qemu) gva2gpa 0xffff8880029fe000 gpa: 0x29fe000 NOTE: Don't forget to subtract the correct page size, ''0x1000''. (I forgot several times and wondered why the offset didn't match.) We can (and should) verify both addresses by searching for the data in the physical memory file, as we did earlier in [[#Thanks for the Memory]]: $ rafind2 -x "$i" /dev/shm/mem 0x29fe000 <-- the new physical address <--. 0x3305000 --> Copying the binary to --' 0x6000ca0 ===[ Watching a New Address on the Host ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now compute the virtual address of ''0x29fe000'' so that we can monitor it in gdb-host: (qemu) gpa2hva 0x29fe000 Host virtual address for 0x29fe000 (mem) is 0x7ff7869fd000 What's left is to set a watchpoint for the new address in gdb-host. Again, don't forget the offset! NOTE: gdb-host runs continuously (thanks to our "command" script). We can interrupt it with ''CTRL-C'', then set a new watchpoint and its corresponding "command" script. (gdb-host) ^C (gdb-host) awatch -l *(0x1c + 0x7ff7869fd000) (gdb-host) commands call (int) vm_stop(0) c end (gdb-host) c ===[ From Heroes to Zeroes ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Back to gdb-guest to find out what's rewriting our data: (gdb-guest) c Program received signal SIGTRAP, Trace/breakpoint trap. clear_user_rep_good () at arch/x86/lib/clear_page_64.S:142 (gdb-guest) disassemble Dump of assembler code for function clear_user_rep_good: ... 0xffffffff819e49ce <+14>: rep stos QWORD PTR es:[rdi],rax => 0xffffffff819e49d1 <+17>: and edx,0x7 (gdb-guest) info registers rdi rax rdi 0x12eb00000ff9 rax 0x0 Ha! That looks promising. ''rep stos'' is effectively a memset implementation. It stores ''RCX'' QWORDs of the ''RAX'' value into the memory address at ''RDI'', while ''rep'' increments ''RDI'' and decrements ''RCX''. Not only does it write a lot of zeros, but it's called from a function named ''clear_user'': (gdb-guest) bt #0 clear_user_rep_good #1 __clear_user #2 clear_user #3 padzero <--- #4 load_elf_binary ... Before we investigate it, just to be sure that there is no other write: (gdb-guest) c Program received signal SIGTRAP, Trace/breakpoint trap. 0x000012eb00000006 in ?? () (gdb-guest) x/20i $rip => 0x12eb00000006: mov cl,0x48 0x12eb00000008: push rcx 0x12eb00000009: nop 0x12eb0000000a: nop 0x12eb0000000b: nop 0x12eb0000000c: nop 0x12eb0000000d: nop 0x12eb0000000e: nop 0x12eb0000000f: add eax,0x3e0002 0x12eb00000014: pop rsi 0x12eb00000015: mov dl,0xe 0x12eb00000017: mov eax,0x1 0x12eb0000001c: add BYTE PTR [rax],al 0x12eb0000001e: add BYTE PTR [rax],al 0x12eb00000020: add BYTE PTR [rax],al 0x12eb00000022: add BYTE PTR [rax],al 0x12eb00000024: add BYTE PTR [rax],al 0x12eb00000026: add BYTE PTR [rax],al 0x12eb00000028: add BYTE PTR [rax],al 0x12eb0000002a: add BYTE PTR [rax],al Yup, that's our user space binary with zeroed data. It really looks like ''clear_user'' is the one we're after. ===[ Going Back ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's set a breakpoint on the ''clear_user'' function and roll back using our snapshot from [[#From Physical Guest to Virtual Host]]: (gdb-guest) hbreak clear_user (gdb-guest) monitor loadvm load_elf_binary (gdb-guest) c NOTE: When we load the snapshot, QEMU immediately restores all its states (vCPUs, registers, memory, etc.) => the instruction pointer (''RIP'') points to the first instruction of ''load_elf_binary'', where we took the snapshot. HOWEVER! GDB isn't aware of this yet, so it still holds the old data because it caches values like registers. To view the correct registers right after ''loadvm'', flush the cache with: ''maintenance flush register-cache'' [ref23]. (gdb) info registers rip rip 0xfff0 (gdb) monitor loadvm snap (gdb) info registers rip rip 0xfff0 (gdb) monitor info registers ... RIP=ffffffff813e2c80 ... (gdb) maintenance flush register-cache Register cache flushed. (gdb) info registers rip rip 0xffffffff813e2c80 0xffffffff813e2c80 ===[ Inlining ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now it's time for my favorite game: figuring out what arguments we're dealing with when a function has been inlined. The prototype for ''clear_user'' is [ref24]: static __always_inline unsigned long clear_user(void __user *to, unsigned long n) When we hit the breakpoint and follow the kernel calling convention, we'll see a rather odd ''n'' argument, such as: Breakpoint 2.7, clear_user (n=4071, to=0x12eb00000019) at arch/x86/include/asm/uaccess_64.h:123 (gdb-guest) info registers rdi rsi rdi 0x12eb00000019 20800526614553 ; to rsi 0x12eb0000001a 20800526614554 ; n The ''to'' argument looks legit, but ''n'' looks completely bonkers. That's because ''clear_user'' was inlined into its caller and is not a callable function. When a function is inlined, the compiler treats it more like a C macro -- injecting the code at each call site [ref25] [ref26]. So there is no calling convention, no prologue/epilogue, no return, and so on. Some registers like ''RSI'' may remain intact, but we cannot rely on that. We have to look at the disassembly and the corresponding registers to make sense of the arguments: (gdb-guest) disassemble Dump of assembler code for function padzero: ffffffff813e17a5 <+5>: mov rax,rdi ; rax = 0x12eb00000019 ffffffff813e17a8 <+8>: and eax,0xfff ; rax = 0x12eb00000019 & 0xfff = 0x19 ... ffffffff813e17af <+15>: mov ecx,0x1000 ffffffff813e17b4 <+20>: sub rcx,rax ; rcx = 0xfe7 ... ffffffff813e17d6 <+54>: xor eax,eax ; rax = 0 ffffffff813e17d8 <+56>: call 0xffffffff819e49c0 ... (gdb-guest) disassemble clear_user_rep_good ... ffffffff819e49c8 <+8>: shr rcx,0x3 ; rcx = 0xfe7 >> 3 = 0x1fc ... ffffffff819e49ce <+14>: rep stos QWORD PTR es:[rdi],rax The actual arguments to ''clear_user'' are in ''RDI'' (= ''to'') and ''RCX'' (= ''n''): (gdb-guest) info registers rdi rcx rdi 0x12eb00000019 20800526614553 ; to rcx 0xfe7 4071 ; n It's really zeroing the data in our binary starting from offset ''0x19'' (= ''0x12eb00000019 - 0x12eb00000000'') up to the end of the page. NOTE: We could get the same information by setting a breakpoint directly on ''rep stos'' and deducing the rest from the arguments it consumes -- it writes the value in ''RAX'' (= ''0'') to the memory at ''RDI'', repeated ''RCX'' times in QWORD units. ===[ Up to the Frame ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Even though we've successfully verified that it's happening and where, we still don't know why. Since we're in the "zeroing" function, let's walk up the backtrace to see exactly where we are: (gdb-guest) bt #0 clear_user #1 padzero #2 load_elf_binary ... (gdb-guest) frame 2 #2 0xffffffff813e3625 in load_elf_binary (bprm=0xffff888006000c00) at fs/binfmt_elf.c:1245 (gdb-guest) x/20i $rip-40 ffffffff813e3604 : call <-. ffffffff813e3609 : test eax,eax | ffffffff813e360b : jne | ffffffff813e3611 : mov rcx,QWORD PTR [rsp+0x18] | ffffffff813e3616 : cmp rbx,rcx | ffffffff813e3619 : je | ffffffff813e361b : mov rdi,QWORD PTR [rsp+0x30] | ffffffff813e3620 : call <-' We can pinpoint the exact location fairly easily, since there's only one place in ''load_elf_binary'' where ''set_brk'' is followed by ''padzero'' [ref27]: retval = set_brk(elf_bss, elf_brk, bss_prot); if (retval) goto out_free_dentry; if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) { // <---- retval = -EFAULT; /* Nobody gets to see this, but.. */ goto out_free_dentry; } ''padzero'' is called only if ''elf_bss'' (statically allocated variables) and ''elf_brk'' (heap) are not equal. We already know their values from our earlier inspection of ''set_brk'': elf_bss = 0x12eb00000019 elf_brk = 0x12eb0000001a From these two names and their values, we can already infer a lot. If you know what the ''.bss'' section is [ref28], it's an immediate red flag -- it holds statically allocated variables that should be initialized to 0 (at least for C programs on Linux). This not only explains the zeroing but also confirms that the starting offset is ''+0x19'', just as we suspected at the beginning. One mystery solved. Now the question is: why ''elf_bss'' and ''elf_brk'' differ? We don't want that condition to trigger. ===[ Back to the Beginning ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We need to determine what ''elf_bss'' and ''elf_brk'' actually are -- whether they're memory addresses (for example, on the stack) or registers. We also want to know what values are written to them, meaning what logical data they hold => how they map to the ELF headers. We know that ''set_brk'' takes both ''elf_bss'' and ''elf_brk'' as arguments, so let's find out where they come from by disassembling the instructions before ''call '': (gdb-guest) x/30i $rip-100 ffffffff813e35cb : mov rax,QWORD PTR [rsp+0x40] ffffffff813e35d0 : mov rcx,QWORD PTR [rsp+0x18] ... ffffffff813e35e6 : lea rdi,[rax+r10*1] ; elf_bss ffffffff813e35ea : lea rsi,[rax+rcx*1] ; elf_brk ... ffffffff813e3604 : call 0xffffffff813e1910 So both are on the stack: elf_bss = [rsp+0x40] + r10 elf_brk = [rsp+0x40] + [rsp+0x18] Reverse execution would be nice, since we could simply ''reverse-step'' and ''reverse-continue'' until we found what we're looking for. But with snapshots, we can manage without it. Let's set a breakpoint at: ''0xffffffff813e35cb : mov rax,QWORD PTR [rsp+0x40]'' and go back in time by loading the snapshot: (gdb-guest) hbreak *0xffffffff813e35cb (gdb-guest) monitor loadvm load_elf_binary (gdb-guest) c Breakpoint 3, 0xffffffff813e35cb in load_elf_binary (bprm=0xffff888006000c00) at fs/binfmt_elf.c:1230 (gdb-guest) x/1gx $rsp+0x40 0xffffc90000013e20: 0x0000000000000000 (gdb-guest) x/1gx $rsp+0x18 0xffffc90000013df8: 0x000012eb0000001a (gdb-guest) p/x $rsp+0x18 $5 = 0xffffc90000013df8 (gdb-guest) watch -l *0xffffc90000013df8 (gdb-guest) monitor loadvm load_elf_binary (gdb-guest) c Hardware watchpoint 4: -location *0xffffc90000013df8 (gdb-guest) x/10i $rip-30 ffffffff813e31fc : test esi,esi ffffffff813e31fe : cmove rdx,rdi ffffffff813e3202 : add rcx,rax ffffffff813e3205 : mov QWORD PTR [rsp+0x38],rdx ffffffff813e320a : cmp QWORD PTR [rsp+0x18],rcx ffffffff813e320f : jae ffffffff813e3211 : mov DWORD PTR [rsp+0x58],ebx ffffffff813e3215 : mov QWORD PTR [rsp+0x18],rcx <--- =>ffffffff813e321a : movzx eax,WORD PTR [rbp+0xd8] (gdb-guest) info registers rcx rcx 0x12eb0000001a 20800526614554 I don't wanna cause any panic, but we're lost. ===[ Where the Hell Are We?! ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is one of those cases where having the kernel source code is a very good idea: cd /usr/src apt-get source linux=6.1.137-1 NOTE: We need the exact version with all patches applied (see [ref10]). (gdb-guest) directory /usr/src/linux-6.1.137 (gdb-guest) list *0xffffffff813e3215-20 1213 k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz; // 0x12eb00000019 1214 1215 if (k > elf_bss) // if (0x12eb00000019 > 0) 1216 elf_bss = k; // elf_bss = 0x12eb00000019 1217 if ((elf_ppnt->p_flags & PF_X) && end_code < k) 1218 end_code = k; 1219 if (end_data < k) 1220 end_data = k; 1221 k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz; // 0x12eb0000001a 1222 if (k > elf_brk) { // if (0x12eb0000001a > 0) 1223 bss_prot = elf_prot; 1224 elf_brk = k; // elf_brk = 0x12eb0000001a 1225 } When we step through the code and inspect the values, we get: elf_bss = elf_ppnt->p_vaddr + elf_ppnt->p_filesz; // 0x12eb00000019 elf_brk = elf_ppnt->p_vaddr + elf_ppnt->p_memsz; // 0x12eb0000001a Now we know that ''elf_bss'' is a combination of two values from the program header: ''p_vaddr + p_filesz'' (and likewise, ''elf_brk'' is ''p_vaddr + p_memsz''). This is another bug that could have been caught by using our readelf tool [ref10], where the mismatch would have been clearly visible: $ ./read_elf $f ... p_filesz = 0000000000000001 (1) p_memsz = 0000000000000002 (2) <--- p_memsz != p_filesz But where's the fun in that? ===[ OUTRO ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Both time-travel debugging (or at least VM snapshots) and physical memory watchpoints are extremely useful and can eliminate a lot of trouble. Moreover, if time-travel worked in QEMU as intended, it would be ideal, as it would provide deterministic execution with time-travel capabilities. Until then, VM snapshots are still excellent capability. Nonetheless, QEMU is not only a great virtual machine, it is also a great tool for reverse engineering and system analysis. It has quirks and bugs, but because it is feature-rich and open source, there is usually a workaround. Up to this point, we've talked about vanilla QEMU, but there are also QEMU forks such as QIRA [ref29] and PANDA [ref30]. QIRA is a GUI for timeless debugging, though it's kinda dead (geohot works on it when he feels like it [ref31]). PANDA, on the other hand, is actively developed and focused on software analysis. While I'm on the topic of time-travel debuggers, I'll mention ''rr'' (record/replay) [ref32] for low-level Linux x86 user space. It's really good for real debugging (it was originaly developed for debugging Mozilla Firefox [ref33]), but be careful -- ''rr'' is NOT sandboxed! And don't forget: Hack The Planet! They're trashing the flow of data! [[/html/2025/2025-09-11--touching_small_elfs-p1-broken_tools.html|Part 1: Understanding Small ELFs and Fixing Broken Tools]] [[/html/2025/2025-10-06--touching_small_elfs-p2-segfaults_everywhere.html|Part 2: ELF Magic Gone Wrong: Debugging SEGFAULTs (Examples of ELF Failures)]] ===[ References ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > [ref1] https://en.wikipedia.org/wiki/Time_travel_debugging > [ref2] https://www.qemu.org/docs/master/system/replay.html > [ref3] https://www.qemu.org/docs/master/devel/replay.html > [ref4] https://wiki.qemu.org/Features/record-replay > [ref5] https://gitlab.com/qemu-project/qemu/-/issues/2634 > [ref6] https://sourceware.org/gdb/current/onlinedocs/gdb.html/Connecting.html#index-add-new-commands-for-external-monitor > [ref7] https://sourceware.org/pipermail/gdb-patches/1999-August/000778.html > [ref8] https://research.h4x.cz/html/2025/2025-10-06--touching_small_elfs-p2-segfaults_everywhere.html > [ref9] https://www.youtube.com/watch?v=3Q25sogi-xo * It’s RAAAAAAW Supercut (2 Million Subscribers Special) | Hell’s Kitchen > [ref10] https://research.h4x.cz/html/2025/2025-09-11--touching_small_elfs-p1-broken_tools.html > [ref11] https://qemu.readthedocs.io/en/v10.0.3/devel/migration/index.html > [ref12] https://www.kernel.org/doc/html/v6.4/security/self-protection.html#kernel-address-space-layout-randomization-kaslr > [ref13] https://www.kernel.org/doc/html/next/x86/pti.html > [ref14] https://book.rada.re/tools/rafind2/intro.html > [ref15] https://www.kernel.org/doc/html/v6.4/arch/x86/x86_64/mm.html > [ref16] https://github.com/qemu/qemu/commit/e9628441df3a7aa0ee83601a0cc9111b91e2319a > [ref17] https://github.com/cyrus-and/gdb-dashboard > [ref18] https://github.com/qemu/qemu/blob/v10.1.1/system/cpus.c#L724 > [ref19] https://github.com/qemu/qemu/blob/v10.1.1/qapi/run-state.json > [ref20] https://www.felixcloutier.com/x86/rep%3Arepe%3Arepz%3Arepne%3Arepnz > [ref21] https://elixir.bootlin.com/linux/v6.1.137/source/fs/binfmt_elf.c#L112 > [ref22] https://www.man7.org/linux/man-pages/man2/brk.2.html > [ref23] https://sourceware.org/gdb/current/onlinedocs/gdb.html/Maintenance-Commands.html > [ref24] https://elixir.bootlin.com/linux/v6.1.137/source/arch/x86/include/asm/uaccess_64.h#L121 > [ref25] https://www.kernel.org/doc/local/inline.html > [ref26] https://gcc.gnu.org/onlinedocs/gcc/Inline.html > [ref27] https://elixir.bootlin.com/linux/v6.1.137/source/fs/binfmt_elf.c#L1242 > [ref28] https://en.wikipedia.org/wiki/.bss > [ref29] https://github.com/geohot/qira > [ref30] https://github.com/panda-re/panda/blob/dev/panda/docs/manual.md > [ref31] https://www.youtube.com/watch?v=QleTEw0hKXQ * George Hotz | Programming | Improving and running QIRA from scratch! | Part3 > [ref32] https://rr-project.org/ > [ref33] https://github.com/rr-debugger/rr/wiki/Recording-Firefox