.-----------------------------------------------------------------------------.
| Kprobes, Linux ELF loader, and C templating |
'-----------------------------------------------------------------------------'
updated: 2025-06-20
@@@@@@@ @@@@@
@@@@@@@@@@@@ @@@@@@@@@@@@@@@
@@@@ @@@@@@ @@@@@@ @@@@@
@@@@ @@@@ @@@@@@ @@@@@ @@@@
@@@@@ @@@@@@@@ @@@@@@@@@@@@@ @@@@
@@@@@ @@@@ @@@@@@ @@@@@@@@@@@ @@@
@@@@ @@@ @@@@@@ @@@@@@@@ @@@
@@@@ @@@@ @@@@@@@ @@@@@@ @@@
@@@@@ @@@@ . @@@@@@ @@@@@@ @@@
@@@@@ @@@@ . @@@@@@@ @@@@@@ @@@
@@@@ @@@@ * * * @@@@@@ @@@@@ @@@
@@@@ @@@@ ******** . . @@@@@@ @@@@@ @@@
@@@@@ @@@@ *** ** . ** @@@@@@@ @@@ @@@
@@@@@ @@@@ * . . ** * ****** . @@@ @@@@@ @@@
@@@@ @@@ . *** *** ** *** . @@@@@@ @@@
@@@@ @@@@@@@ * . * *** *** @@@@@@@; @@@
@@@@@ @@@@@@@@@@@ . ***. * .*** .@@@@@@;;; @@@
@@@@ @@@@@@@@@@@@@@ ** . ** @@@@@@;;;; @@@
@@@@ @@@ @@@@@@@@@@@@ . . **** @@@@@@@;;;;; @@@@
@@@@ @@@ @@ @@@@@@@@@@@@@ . *** . @@@@@@@;;;;; @@@
@@@@@ @@@@@ @@@@@@@@@@@@@@@@@ ** . @@@@@@@;;;;; @@@
@@@@ @@@@@@@@@@@@@@@@ @@@@@@@@@ @@@@@@;;;;;' @@@
@@@@ @@@@@@@@@@@@@@@@ @@@ @@@@@@@@@@ @@@@@@@;;;;; @@@
@@@@ @@@@@@@@@ @@ @@@ @@@@@@@@@@@@@@@ @@@@@@;;;;; @@@@
@@@@ @@@@@@@@ @@ @@ @@@@@@@@@@@ @@@@@ @@@@@@@;;;;' @@@@
@@@@ @@@@@@ @@ @@ @@ @@@@@@ @@ @@@ @@@@@@@;;;; @@@@
@@@@ @@@@@ @@ @@ @@ @@ @@@@@ @@@ @@@@@@;;;;; @@@ .
@@@@ @@@@@@ @@ @@ @@ @@@@@@@@@@ @@@@@@@;;;; @@@ @
@@@@ @@@@@ @@ @@ @@ @@ @@@@@@@@@ @@@@@@@;;;;; @@@ @
@@@@ @@@@@@ @@ @@ @@ @@@@@@@@@ @@@@@@@;;;;; @@@@ @
@@@@ @@@@@@@@ @@ @@ @@@@@@@@@@ @@@@@@;;;;; @@@@ @@
@@@@ @@@@@@@@ @@ @@@@@@@@@@ @@@@@@;;;;; @@@@ @@
@@@@@ @@@@@@@@@@ @@@@@@@@@@ @@@@@@@@;;;; @@@@ @@@
@@@@@@ @@@@@@@@@@@@@@@@@@ @@@@@@@;;;; @@@ @@@@
@@@@@@ @@@@@@@@@@@ @@@@@@@;;;;; @@@@ @@@@
@@@@@@@@ @@@@@@@@@@;;;;; @@@ @@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@;;;; @@@ @@@@@
;@@@@@@@@@@@@@@@@@@@@;;;;; @@@ @@@ @@
;;;;@@@@@@@@@@;;;;;;;; @@@@ @@@@
;;;;;;;;;;;;;;;; '@@@@@@@@'
When using kprobes, beware of symbolic kernel tracing! Some programs may fail
even though a function is clearly traceable:
# bpftrace -e 'kprobe:load_elf_binary { printf ("hit\n") }'
Attaching 1 probe...
[WARN] libbpf: prog 'kprobe_load_elf_binary_1': BPF program load failed:
Invalid argument
[WARN] libbpf: prog 'kprobe_load_elf_binary_1': failed to load: -22
cannot attach kprobe, Cannot assign requested address
ERROR: Error attaching probe: kprobe:load_elf_binary
'bpftrace' doesn't fail because the 'load_elf_binary' function is
untraceable. It fails because the symbol has two different addresses:
$ grep load_elf_binary /proc/kallsyms
ffffffff813e2c80 t load_elf_binary
ffffffff813e58d0 t load_elf_binary
A symbol with multiple addresses is not an anomaly. We can even see it
implemented twice in the debug binary:
$ objdump -M intel-mnemonic -d /usr/lib/debug/boot/vmlinux-$(uname -r) \
| grep -A7 '<load_elf_binary>'
ffffffff813e2c80 <load_elf_binary>:
ffffffff813e2c80: e8 1b 2d c9 ff call ffffffff810759a0 <__fentry__>
ffffffff813e2c85: 41 57 push r15
ffffffff813e2c87: 41 56 push r14
ffffffff813e2c89: 41 55 push r13
ffffffff813e2c8b: 41 54 push r12
ffffffff813e2c8d: 55 push rbp
ffffffff813e2c8e: 48 89 fd mov rbp,rdi
--
ffffffff813e58d0 <load_elf_binary>:
ffffffff813e58d0: e8 cb 00 c9 ff call ffffffff810759a0 <__fentry__>
ffffffff813e58d5: 41 57 push r15
ffffffff813e58d7: 41 56 push r14
ffffffff813e58d9: 41 55 push r13
ffffffff813e58db: 41 54 push r12
ffffffff813e58dd: 55 push rbp
ffffffff813e58de: 53 push rbx
And the code really differs. So, which one is correct? Well, it depends. Do we
want to trace the functions for loading a 64-bit ELF binary or a 32-bit one?
Yes, one is the 64-bit ELF loader, and the other is for loading 32-bit ELF.
Now, why is that? Why do they both have the same name? Let's ignore the symbol
name collision for now and focus on a more interesting question: why are there
two 'load_elf_binary' implementations?
If we look closely at the Linux kernel source code, 'load_elf_binary' [ref1]
is defined only once -- in 'fs/binfmt_elf.c'. So where does the second one
come from? Have you ever wondered how Linux on x86-64 executes 32-bit binaries?
Where is the 32-bit ELF loader implemented? As you've probably guessed, it's
the same 'load_elf_binary' function in 'fs/binfmt_elf.c'. But how?
If you've seen the function, you've probably noticed that the ELF structures
are oddly generic, like 'struct elf_phdr' [ref2]. So how does the kernel
handle different address widths, for example between 'Elf64_Addr' and
'Elf32_Addr' [ref3]? The structures are processed at compile time, so there
can't be any on-the-fly detection of the ELF type.
Well, the 'fs/binfmt_elf.c' file is, de facto, a C template. The native ELF
structures, like the aforementioned 'elf_phdr', are defined by C macros in
'include/linux/elf.h' [ref4]. But there's also a second file,
'fs/compat_binfmt_elf.c', which redefines these macros and symbolic constants
[ref5], such as:
--------------------------[ fs/compat_binfmt_elf.c ]---------------------------
/*
* 32-bit compatibility support for ELF format executables and core dumps.
...
* This file is used in a 64-bit kernel that wants to support 32-bit ELF.
* asm/elf.h is responsible for defining the compat_* and COMPAT_* macros
* used below, with definitions appropriate for 32-bit ABI compatibility.
*
* We use macros to rename the ABI types and machine-dependent
* functions used in binfmt_elf.c to compat versions.
*/
...
#undef elf_phdr
#define elf_phdr elf32_phdr
-------------------------------------------------------------------------------
And at the end of the file, it actually *INCLUDES* the 'fs/binfmt_elf.c'
file!
--------------------------[ fs/compat_binfmt_elf.c ]---------------------------
/*
* We share all the actual code with the native (64-bit) version.
*/
#include "binfmt_elf.c"
-------------------------------------------------------------------------------
Behold! This is one of the greater C hacks in real-world code: C templating!
Cute, right? (By the way, 'fs/binfmt_elf.c' is not x86-specific; the same
trick is used on arm64, for example.)
The remaining question is: how does the kernel get away with one symbol name
pointing to two different addresses?
Simple: when we look at all the functions in the file, we see that they are
declared as static. For example:
------------------------------[ fs/binfmt_elf.c ]------------------------------
static int load_elf_binary(struct linux_binprm *bprm);
-------------------------------------------------------------------------------
That makes it a local symbol. Such symbols with identical names can exist in
multiple object files because they are not globally visible. So there are
actually two independent 'load_elf_binary' functions. One in
'fs/binfmt_elf.o' and one in 'fs/compat_binfmt_elf.o'.
Back to kprobes.
Even though 'bpftrace' fails, not all tools do. For instance, 'perf probe'
handles this correctly by creating kprobes for both addresses:
# perf probe load_elf_binary
Added new events:
probe:load_elf_binary (on load_elf_binary)
probe:load_elf_binary (on load_elf_binary)
You can now use it in all perf tools, such as:
perf record -e probe:load_elf_binary -aR sleep 1
# perf probe --list
probe:load_elf_binary (on load_elf_binary@fs/binfmt_elf.c)
probe:load_elf_binary (on load_elf_binary@fs/binfmt_elf.c)
The '--list' output is misleading about the source file, but the actual
kprobe attachment is correct:
# cat /sys/kernel/debug/kprobes/list
ffffffff813e2c80 k load_elf_binary+0x0 [DISABLED][FTRACE]
ffffffff813e58d0 k load_elf_binary+0x0 [DISABLED][FTRACE]
Now, there's one thing I don't know: which address corresponds to the 32-bit
ELF loader, and which to the 64-bit one?
It likely depends on the object file link order. I suppose the native ELF
loader functions from 'binfmt_elf.o' come first, so the first address is
probably for the 64-bit ELF. Fortunately, it's easy enough to test:
# bpftrace -e 'k:$1 { printf ("hit\n") }' 0xffffffff813e2c80
stdin:1:1-6: WARNING: 0xffffffff813e2c80 is not traceable (either
non-existing, inlined, or marked as "notrace"); attaching to it
will likely fail.
Attaching 1 probe...
hit
hit
hit
NOTE: The bpftrace warning is still misleading, even in the current version
(v0.22.1). It always prints this message when attaching to a raw address,
regardless of whether the region is actually traceable.
We looked specifically at 'load_elf_binary', but there are a plethora of
symbols with multiple addresses. Just check '/proc/kallsyms'.
On my 6.1.137-amd64 kernel, there are approximately 398 such symbols:
$ awk '$2 == "t" || $2 == "T" {print $3}' /proc/kallsyms \
| sort | uniq -c | grep -v '^ *1 ' | wc -l
398
We have to be careful about what we are tracing. We might end up tracing the
wrong functions.
For completeness, here are some checks to determine if a function is (not)
traceable:
grep -w FUNC /proc/kallsyms # Must be here
grep -w FUNC /sys/kernel/tracing/available_filter_functions # Must be here
grep -w FUNC /sys/kernel/debug/kprobes/blacklist # Must NOT be here
Hack the planet!
[ref1] https://elixir.bootlin.com/linux/v6.10.14/source/fs/binfmt_elf.c#L819
[ref2] https://elixir.bootlin.com/linux/v6.10.14/source/fs/binfmt_elf.c#L825
[ref3] https://www.man7.org/linux/man-pages/man5/elf.5.html
[ref4] https://elixir.bootlin.com/linux/v6.10.14/source/include/linux/elf.h#L53
[ref5] https://elixir.bootlin.com/linux/v6.10.14/source/fs/compat_binfmt_elf.c#L35