.-----------------------------------------------------------------------------.
| Kprobes, Linux ELF loader, and C templating |
'-----------------------------------------------------------------------------'
updated: 2025-06-20
@@@@@@@ p@@@Z
@@@@@@@@@@@@ @@@@@@@@@@@@@@#
@@@@. @@@@@| @@@@@C @@@@@
@@@@ @@@@ b@@@@@ l@@@@ @@@Q
_@@@@ @@@@@@@@ @@@@@@@@@@ ] @@@W
@@@@{ @@@C @@@@@@ ^@@@@@@@@@- @@@
@@@@ @@@ @@@@@@ @@@@@@@@ @@@
@@@@ @@@@ @@@@@@v @@@@@$ @@@
+@@@@ @@@@ @@@@@@ f@@@@@ @@@
@@@@> @@@`t@ i@@@@@@ @@@@@@ @@@
@@@@ ^@@@ @@X @@@@@@; @@@@@ @@*
@@@@. @@@@ '@@ # @@@@@@ $@@@@ @@@
z@@@@ @@@@ @@@@ @@ h@@@@@a @@@ @@@
@@@@I @@@` . @>@@ @@@ @@@ ^@@@@ @@@
@@@@ ;@@@ @@@@@@ n @@@@@@ @@@
@@@@ @@@@@@@ @@@ @@@@ @@@@@@l; @@@
b@@@@ @@@@@@@@@@r . @@ @ @@. .@@@@@@;,I @@@
@@@@. .@@@@@@@@@@@@@@ O@@ . @@@@@@;I;;. @@@
@@@@ @@@ @@@@@@@@@@@ @ z@@@@@@;;;;; p@@f
@@@@ @@@ @@ @@@@@@@@@@@@@ . @@@@@@@:;;;; @@@
m@@@@ @@@@@ @@@@@@@@@@@@@@@@@ . @@@@@@^;;;;; @@@
@@@@ @@@@@@@@@@@@@@@@ @@@@@@@@@@I @@@@@@:I;;;^ @@@
@@@@ @@@@@@@@@@@@@@@@ @@@ @@@@@@@@@@@ .. ^@@@@@@:;;;; @@@
@@@@ @@@@@@@@@@@@@@@@@ @@@ @@@@@@@@@@@@@0 @@@@@@@:;;;; @@@
@@@@ @@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@ @@@@@@!:;;I; @@@
@@@ @@@@@@@@@@@@ @@@@@@@@@@@@@@@@@ @@@@@$ @@@@@@:;;;;` .@@@
@@@@ @@@@@@@@@ @@ @@@@@@@@@@@@@@ @@@ @@@ @@@@@@:;;;; p@@@
@@@@ @@@@@@@@ @@ @@ @@@@@@@@@@@ , @@ @@@@@@@;;;;I @@@@
@@@@ l@@@@@@. @@ @@ @@ @@@@@@@@@@@@@@@ @@@@@@i;;;;; @@@d
@@@@ @@@@@ @@ @@ @@ @@ @@@@@@@@@@@p @@@@@@;;;;;. @@@ .
@@@@ @@@@@@ @@ @@ @@ @@@@@@@@@@ .@@@@@@,;;;; @@@ @
@@@t ^@@@@ @@ @@ @@ @@ @@@@@@@@@ $@@@@@@;;;;; @@@ @
@@@@ @@@@@@ @@ @@ @@ @@@@@@@@@ @@@@@@<;;;;; I@@@ @
@@@@ @@@@@@@w @@ @@ @@@@@@@@@Q @@@@@@;;;;I *@@@ @@
.@@@@ .@@@@@@@@ @@ @@@@@@@@@@ @@@@@@:;;; @@@@ @@
@@@@$ @@@@@@@@@@ .@@@@@@@@@@ @@@@@@@I;;: @@@z @@@
>@@@@1 @@@@@@@@@@@@@@@@@` @@@@@@[:;;I @@@ @@@@
@@@@@@ m@@@@@@@@@@ @@@@@@@;;;; (@@_ @@@@
@@@@@@@@ <@@@@@@@@@;;;I @@@ @@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@,;;;I. @@@ @@@@@
;,@@@@@@@@@@@@@@@@@@x:;;;; @@@ @@@ @@
;;::,f@@@@@@@[::;;;;;; @@@@ @@@@
,:I;;;;;;;;;;;;; '@@@@@@@@'
When using kprobes, beware of symbolic kernel tracing! Some programs may fail
even though a function is clearly traceable:
# bpftrace -e 'kprobe:load_elf_binary { printf ("hit\n") }'
Attaching 1 probe...
[WARN] libbpf: prog 'kprobe_load_elf_binary_1': BPF program load failed:
Invalid argument
[WARN] libbpf: prog 'kprobe_load_elf_binary_1': failed to load: -22
cannot attach kprobe, Cannot assign requested address
ERROR: Error attaching probe: kprobe:load_elf_binary
'bpftrace' doesn't fail because the 'load_elf_binary' function is
untraceable. It fails because the symbol has two different addresses:
$ grep load_elf_binary /proc/kallsyms
ffffffff813e2c80 t load_elf_binary
ffffffff813e58d0 t load_elf_binary
A symbol with multiple addresses is not an anomaly. We can even see it
implemented twice in the debug binary:
$ objdump -M intel-mnemonic -d /usr/lib/debug/boot/vmlinux-$(uname -r) \
| grep -A7 '<load_elf_binary>'
ffffffff813e2c80 <load_elf_binary>:
ffffffff813e2c80: e8 1b 2d c9 ff call ffffffff810759a0 <__fentry__>
ffffffff813e2c85: 41 57 push r15
ffffffff813e2c87: 41 56 push r14
ffffffff813e2c89: 41 55 push r13
ffffffff813e2c8b: 41 54 push r12
ffffffff813e2c8d: 55 push rbp
ffffffff813e2c8e: 48 89 fd mov rbp,rdi
--
ffffffff813e58d0 <load_elf_binary>:
ffffffff813e58d0: e8 cb 00 c9 ff call ffffffff810759a0 <__fentry__>
ffffffff813e58d5: 41 57 push r15
ffffffff813e58d7: 41 56 push r14
ffffffff813e58d9: 41 55 push r13
ffffffff813e58db: 41 54 push r12
ffffffff813e58dd: 55 push rbp
ffffffff813e58de: 53 push rbx
And the code really differs. So, which one is correct? Well, it depends. Do we
want to trace the functions for loading a 64-bit ELF binary or a 32-bit one?
Yes, one is the 64-bit ELF loader, and the other is for loading 32-bit ELF.
Now, why is that? Why do they both have the same name? Let's ignore the symbol
name collision for now and focus on a more interesting question: why are there
two 'load_elf_binary' implementations?
If we look closely at the Linux kernel source code, 'load_elf_binary' [ref1]
is defined only once -- in 'fs/binfmt_elf.c'. So where does the second one
come from? Have you ever wondered how Linux on x86-64 executes 32-bit binaries?
Where is the 32-bit ELF loader implemented? As you've probably guessed, it's
the same 'load_elf_binary' function in 'fs/binfmt_elf.c'. But how?
If you've seen the function, you've probably noticed that the ELF structures
are oddly generic, like 'struct elf_phdr' [ref2]. So how does the kernel
handle different address widths, for example between 'Elf64_Addr' and
'Elf32_Addr' [ref3]? The structures are processed at compile time, so there
can't be any on-the-fly detection of the ELF type.
Well, the 'fs/binfmt_elf.c' file is, de facto, a C template. The native ELF
structures, like the aforementioned 'elf_phdr', are defined by C macros in
'include/linux/elf.h' [ref4]. But there's also a second file,
'fs/compat_binfmt_elf.c', which redefines these macros and symbolic constants
[ref5], such as:
--------------------------[ fs/compat_binfmt_elf.c ]---------------------------
/*
* 32-bit compatibility support for ELF format executables and core dumps.
...
* This file is used in a 64-bit kernel that wants to support 32-bit ELF.
* asm/elf.h is responsible for defining the compat_* and COMPAT_* macros
* used below, with definitions appropriate for 32-bit ABI compatibility.
*
* We use macros to rename the ABI types and machine-dependent
* functions used in binfmt_elf.c to compat versions.
*/
...
#undef elf_phdr
#define elf_phdr elf32_phdr
-------------------------------------------------------------------------------
And at the end of the file, it actually *INCLUDES* the 'fs/binfmt_elf.c'
file!
--------------------------[ fs/compat_binfmt_elf.c ]---------------------------
/*
* We share all the actual code with the native (64-bit) version.
*/
#include "binfmt_elf.c"
-------------------------------------------------------------------------------
Behold! This is one of the greater C hacks in real-world code: C templating!
Cute, right? (By the way, 'fs/binfmt_elf.c' is not x86-specific; the same
trick is used on arm64, for example.)
The remaining question is: how does the kernel get away with one symbol name
pointing to two different addresses?
Simple: when we look at all the functions in the file, we see that they are
declared as static. For example:
------------------------------[ fs/binfmt_elf.c ]------------------------------
static int load_elf_binary(struct linux_binprm *bprm);
-------------------------------------------------------------------------------
That makes it a local symbol. Such symbols with identical names can exist in
multiple object files because they are not globally visible. So there are
actually two independent 'load_elf_binary' functions. One in
'fs/binfmt_elf.o' and one in 'fs/compat_binfmt_elf.o'.
Back to kprobes.
Even though 'bpftrace' fails, not all tools do. For instance, 'perf probe'
handles this correctly by creating kprobes for both addresses:
# perf probe load_elf_binary
Added new events:
probe:load_elf_binary (on load_elf_binary)
probe:load_elf_binary (on load_elf_binary)
You can now use it in all perf tools, such as:
perf record -e probe:load_elf_binary -aR sleep 1
# perf probe --list
probe:load_elf_binary (on load_elf_binary@fs/binfmt_elf.c)
probe:load_elf_binary (on load_elf_binary@fs/binfmt_elf.c)
The '--list' output is misleading about the source file, but the actual
kprobe attachment is correct:
# cat /sys/kernel/debug/kprobes/list
ffffffff813e2c80 k load_elf_binary+0x0 [DISABLED][FTRACE]
ffffffff813e58d0 k load_elf_binary+0x0 [DISABLED][FTRACE]
Now, there's one thing I don't know: which address corresponds to the 32-bit
ELF loader, and which to the 64-bit one?
It likely depends on the object file link order. I suppose the native ELF
loader functions from 'binfmt_elf.o' come first, so the first address is
probably for the 64-bit ELF. Fortunately, it's easy enough to test:
# bpftrace -e 'k:$1 { printf ("hit\n") }' 0xffffffff813e2c80
stdin:1:1-6: WARNING: 0xffffffff813e2c80 is not traceable (either
non-existing, inlined, or marked as "notrace"); attaching to it
will likely fail.
Attaching 1 probe...
hit
hit
hit
NOTE: The bpftrace warning is still misleading, even in the current version
(v0.22.1). It always prints this message when attaching to a raw address,
regardless of whether the region is actually traceable.
We looked specifically at 'load_elf_binary', but there are a plethora of
symbols with multiple addresses. Just check '/proc/kallsyms'.
On my 6.1.137-amd64 kernel, there are approximately 398 such symbols:
$ awk '$2 == "t" || $2 == "T" {print $3}' /proc/kallsyms \
| sort | uniq -c | grep -v '^ *1 ' | wc -l
398
We have to be careful about what we are tracing. We might end up tracing the
wrong functions.
For completeness, here are some checks to determine if a function is (not)
traceable:
grep -w FUNC /proc/kallsyms # Must be here
grep -w FUNC /sys/kernel/tracing/available_filter_functions # Must be here
grep -w FUNC /sys/kernel/debug/kprobes/blacklist # Must NOT be here
Hack the planet!
[ref1] https://elixir.bootlin.com/linux/v6.10.14/source/fs/binfmt_elf.c#L819
[ref2] https://elixir.bootlin.com/linux/v6.10.14/source/fs/binfmt_elf.c#L825
[ref3] https://www.man7.org/linux/man-pages/man5/elf.5.html
[ref4] https://elixir.bootlin.com/linux/v6.10.14/source/include/linux/elf.h#L53
[ref5] https://elixir.bootlin.com/linux/v6.10.14/source/fs/compat_binfmt_elf.c#L35