Quest for the Smallest Possible ELF x86-64 Executable on Linux ~ Fanda Uchytil ===[ INTRO ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The smallest possible executable x86-64 ELF on Linux is 57 bytes. This is quite a claim! How will we get out of this one? Will we be able to defend our honour? Let's put on our delving suite and delve into the depths of find out. ▄▀████▀▄ ██▄▀██▀▄██ █▀▀▀▀▀▀▀▀█ Those are pretty strong words, ▀▄█ █▀██▀█ █▄▀ ▀█ █▄██▄█ █▀ ██▄▄▀▀▄▄██ for a 7 years old girl! █▀▄▀▀▄▀█ █▀▄▀▀▀▀▄▀█ ▀ ██▀██▀██ ▀ ██▀▄█▀▀█▄▀██ ▀██▄██▄██▀ ===[ Officer I Swear It's a Working Example ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's start with a "working" example and build our understanding from there. The following hexdump is the annotated xxd output of a 57-byte x86-64 binary that's a legit executable for the Linux ELF loader: --------------------------[ 57-byte_elf_x86-64.xxd ]--------------------------- 00000000: 7f45 4c46 0000 0000 0000 0000 0000 0000 .ELF............ ; ^-------^ e_entry e_ident[EI_MAG] v-----------------v 00000010: 0200 3e00 0000 0000 0000 0000 0000 0000 ..>............. ; ^.-^ ^-.^ ; | '--- e_machine = x86-64 ; e_type = E_EXEC 00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000030: 0000 0000 0000 3800 01 ......8.. ; ^.-^ ^^ ; e_ehsize ---' '--- e_phnum ------------------------------------------------------------------------------- Convert it into a real binary: grep -v '^;' 57-byte_elf_x86-64.xxd | xxd -r > 57-byte_elf_x86-64 chmod 755 57-byte_elf_x86-64 And test it by running it (in a virtual machine, of course, we are not animals): $ ./57-byte_elf_x86-64 Segmentation fault (core dumped) Pretty good, right?! I'm actually proud of it. It's not just an ordinary segfault, it's a good one (unlike those in [ref1]), because this one didn't crash the ''execve(2)'' syscall... Don't look at me like that! Fully working examples are for normies who are slaves to the system! So, let's see how it works... ===[ ELF Structure ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ How can a 57-byte ELF64 work when the ELF64 header itself is 64 bytes? That should be the minimum, correct? Well, evil always finds a way, kids. When shrinking anything down, the main question is always: which fields are needed and which can be removed? In the case of an ELF executable, we need to look at two places: - the ELF64 structure [ref7], and - the Linux ELF loader [ref4]. Let's start with ELF headers, they define the structure of a program. All ELF binaries must have exactly one ELF header (= ''Elf64_Ehdr''). It's the only header that must start at offset 0, and it defines the most basic characteristics of an ELF binary (e.g., ELF type, architecture, entry point, offset of the first program header, ...). Here is the ELF64 with annotated crucial fields that the linux ELF loader requires (more on that later): -------------------------[ ELF64 header (from elf.h) ]------------------------- typedef struct { unsigned char e_ident[16]; // the first 4 bytes = ELF magic uint16_t e_type; // are we an executable? uint16_t e_machine; // the architecture uint32_t e_version; Elf64_Addr e_entry; // the start address Elf64_Off e_phoff; // the program header offset Elf64_Off e_shoff; uint32_t e_flags; uint16_t e_ehsize; uint16_t e_phentsize; // size of the program header structure uint16_t e_phnum; // number of entries in the program header uint16_t e_shentsize; uint16_t e_shnum; uint16_t e_shstrndx; } Elf64_Ehdr; ------------------------------------------------------------------------------- Only the fields with a comment are necessary for the Linux loader to run an ELF binary (more on that in [[#Trim it! Trim it harder!]]). ELF is not just for executable code, it's a container for data that describe executable code. Its structure (apart from instructions) can also contain symbols, relocations, memory, and so on. In this article, we'll focus only on the executable part of ELF64 => ''e_type'' is either ''ET_EXEC'' (= static executable) or ''ET_DYN'' (= shared/dynamic executable). When we set ''e_type'' to one of the executable types, the kernel ELF loader requires one more data structure -- the program header: ------------------------[ Program Header (from elf.h) ]------------------------ typedef struct { uint32_t p_type; uint32_t p_flags; Elf64_Off p_offset; Elf64_Addr p_vaddr; Elf64_Addr p_paddr; uint64_t p_filesz; uint64_t p_memsz; uint64_t p_align; } Elf64_Phdr; ------------------------------------------------------------------------------- The program header tells the kernel how to load a binary into memory. For example, the ELF loader in Linux kernel 6.12 [ref4] recognizes the following types: PHDR TYPE DESCRIPTION -------------------------------------------------------------------------- PT_LOAD how to map a binary into memory PT_GNU_STACK enables an executable userspace stack PT_GNU_PROPERTY special GNU properties (hardening, ISA specs, ...) PT_INTERP indicates the dynamic linker (the ELF interpreter) PT_LOPROC...PT_HIPROC range for processor-specific segment types NOTE: To execute code from our binary, it must be loaded in working memory. The only two types that do this are ''PT_LOAD'', which describes how the binary should be mapped into memory, and ''PT_INTERP'', which tells the kernel, that the binary needs to load a dynamic linker to resolve all dependent libraries before execution. Now that we have some idea of how an executable ELF is structured, we want to know what we can remove for it to stay a valid executable. ===[ Trim it! Trim it harder! ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As mentioned in the previous section, the minimum size of an ELF64 binary should hypothetically be 64 bytes, since that's the size of the ELF header structure. But what about the program header, which adds another 56 bytes? We also said that it is needed for an executable, well not just needed, but enforced by the ELF loader. Which fields does the loader actually require, and in which structure? The ELF loader is mostly defined in the ''load_elf_binary'' function [ref5], and the first thing it does is check whether the ''e_ident'' field has the 4-byte ELF magic (= ''\x7fELF''): if (memcmp(elf_ex->e_ident, ELFMAG, SELFMAG) != 0) // e_ident == "\177ELF" goto out; ''e_ident'' is an array of 16 bytes (= ''unsigned char e_ident[16]''), but only the first 4 bytes are important. The remaining 12 bytes can be used for whatever (wink wink, nudge nudge, say no more). Then the loader checks whether the binary is executable, and if not, it GTFOs: if (elf_ex->e_type != ET_EXEC && elf_ex->e_type != ET_DYN) goto out; That means the ''e_type'' field is also mandatory. Let's do a quick check of where we are in the ELF header structure and how many bytes we currently need: -----------------------------------[ elf.h ]----------------------------------- typedef struct { unsigned char e_ident[16]; uint16_t e_type; <-- ET_EXEC || ET_DYN uint16_t e_machine; uint32_t e_version; Elf64_Addr e_entry; Elf64_Off e_phoff; Elf64_Off e_shoff; uint32_t e_flags; uint16_t e_ehsize; uint16_t e_phentsize; uint16_t e_phnum; uint16_t e_shentsize; uint16_t e_shnum; uint16_t e_shstrndx; } Elf64_Ehdr; ------------------------------------------------------------------------------- We are now at 18 bytes (we need the full 16 bytes of ''e_ident'' and 2 bytes of ''e_type''). Next, the loader checks the architecture (''e_machine''), which in our case must be equal to ''0x003e'' (= ''EM_X86_64''), defining the x86-64 architecture: if (!elf_check_arch(elf_ex)) // e_machine == EM_X86_64 goto out; It's another 2 bytes => 20 bytes so far. NOTE: We can ignore some checks, such as ''elf_check_fdpic'', as they don't apply to the x86 architecture (but they may on other architectures, where ''e_ident[EI_OSABI]'' can be checked, so be cautious). And """finally""" (for the ELF header), the loader calls ''load_elf_phdrs'', which loads all ''PT_LOAD'' records from the program header: elf_phdata = load_elf_phdrs(elf_ex, bprm->file); if (!elf_phdata) goto out; We need to be extra careful here: if this call fails, the whole loading process fails! When we look into the function, it's obvious that we need at least one program header record: if (elf_ex->e_phentsize != sizeof(struct elf_phdr)) goto out; /* Sanity check the number of program headers and their total size. */ size = sizeof(struct elf_phdr) * elf_ex->e_phnum; if (size == 0 || size > 65536 || size > ELF_MIN_ALIGN) goto out; It then reads all records in the program header from the file (which might be a problem): /* Read in the program headers */ retval = elf_read(elf_file, elf_phdata, size, elf_ex->e_phoff); Let's go back to the ELF header and assess where we are now. From the conditions above, we can see that the loader requires a correct ''e_phentsize'' (which must be exactly ''sizeof (struct elf_phdr)'' => 56 bytes) and a nonzero ''e_phnum'' (it must also be below the relevant limits, but we can ignore that since we want the bare minimum): -----------------------------------[ elf.h ]----------------------------------- typedef struct { unsigned char e_ident[16]; // 16 uint16_t e_type; // 2 uint16_t e_machine; // 2 uint32_t e_version; // 4 Elf64_Addr e_entry; // 8 Elf64_Off e_phoff; // 8 Elf64_Off e_shoff; // 8 uint32_t e_flags; // 4 uint16_t e_ehsize; // 2 uint16_t e_phentsize; // 2 uint16_t e_phnum; // 2 <--- WE ARE HERE uint16_t e_shentsize; uint16_t e_shnum; uint16_t e_shstrndx; } Elf64_Ehdr; ------------------------------------------------------------------------------- None of the following fields are read by the kernel loader, so we're at 58 bytes for the ELF header. That's without the program header. That said, what about the program header? We know that at least one ''sizeof (struct elf_phdr)'' entry is required, which is 56 bytes for a 64-bit executable. Does that bring the total to ''58 + 56 = 114''? Well, nobody said that we cannot overlay the ELF and program headers [ref3]. Moreover, we don't even need a valid program header entry, because the kernel fails only on invalid ''PT_LOAD'' records. Therefore, we can set ''e_phoff'' to zero, which points straight to the beginning of the ELF header and reduces the total number of bytes back to 58. ===[ One Byte Less ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ So far, we have a 58-byte binary and no less, but the binary at the beginning is only 57 bytes long. What gives? We can trim one more byte. ''e_phnum'' is the last field in the ELF header that is needed, it's a two-byte integer, and we can remove one of those bytes. Two properties of x86 allow this: little-endian encoding and page-granularity allocation. Little-endian stores the least significant byte at the smallest address [ref6]. What that means is that if we have a two-byte number with the value 1 (= ''0x0001''), its bytes will be stored in memory in reverse order as ''01 00''. This is one of the reasons why little-endian exists -- it makes type casting straightforward: ''(char) 01'', ''(double) 01 00'', ''(int) 01 00 00 00'', ... All of these represent the same value, 1, but with different type widths. That's nice, but ''e_phnum'' is a two-byte data type (= ''uint16_t''), so that means two bytes are read, right? Correct. And this is where page-size-granularity allocation comes into play. For the sake of efficiency, allocations are done on a page-size basis. So, when anything is read into memory, even if it's just one byte, one full page is allocated (most of the time, 4096 bytes on x86). The data are copied there, and the rest (e.g., 4095 bytes) is zeroed out. That means that when we read ''(uint16_t) e_phnum'' from process memory, it reads an implicit zero from the page. Here, have a picture: FILE MMAP() MEMORY 00000000: 4861 636b 2074 -----> ffff0000: 4861 636b 2074 00000006: 6865 2070 6c61 ffff0006: 6865 2070 6c61 0000000c: 6e65 7421 0a ffff000c: 6e65 7421 0a00 ffff0012: 0000 0000 0000 ffff0018: 0000 0000 0000 ... ffff0ffa: 0000 0000 0000 ffff1000: NOTE: Unfortunately, we cannot use the same trick for the program header, because the loader reads it directly from the file (as we saw in [[#Trim it! Trim it harder!]]). Therefore, if the structure is trimmed in the file, the loader will fail to read the required size. There may be some trick to this, but I didn't find one. When we trim one byte from ''e_phnum'', we still get a valid ELF64 executable, and the final size is 57 bytes. Except that it still segfaults! ===[ Good, bad... I'm the guy with the segfault ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is more than one type of segmentation fault [ref1][ref2]. More precisely, some segfaults are recoverable. Therefore, there is such a thing as a "good" or "bad" segfault. In this case, a "bad" segfault is one that causes the ''execve(2)'' syscall to fail completely (see [ref2]). On the other hand, a "good" segfault, for our purposes, is one that occurs in user space => on an already loaded binary that tries to execute code but fails for some reason. The following strace output is such a "good" segfault, which occurs in the 57-byte binary from the beginning: -------------------------------[ good segfault ]------------------------------- $ strace -i ./57-byte_elf_x86-64 [00007ffff7e7fad7] execve("./57-byte_elf_x86-64", ["./57-byte_elf_x86-64"], 0x7fffffffea58 /* 1 var */) = 0 [0000000000000000] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} --- [????????????????] +++ killed by SIGSEGV (core dumped) +++ Segmentation fault (core dumped) ------------------------------------------------------------------------------- The ''execve()'' succeeds (= returns 0), and then it fails at address 0 because the ''e_entry'' is exactly that -- zero. Why is that good? Because the binary is successfully running as a process and then tries to execute user space instructions, but that is where it fails -- it's unable to execute anything because nothing is mapped there. Why is this useful? Well, now we at least have the theoretical possibility to jump somewhere else in process memory that might be executable -- and if we wish hard enough, there will be such a place... ===[ Summary of 57-byte ELF64 ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We've traveled a long way, let's summarize and finally look at some assembly code: NOTE: The following is ''nasm'' code [ref25], and it's very simple to read: ''db = 1 byte, dw = 2 bytes, dd = 4 bytes, dq = 8 bytes''; the meanings are explained in the comments. ''REQUIRED'' tells you whether that field is required by the Linux ELF loader. --------------------------[ 57-byte_elf_x86-64.nasm ]-------------------------- BITS 64 ; USABLE? db 0x7F, "ELF" ; e_ident[EI_MAG] - times 12 db 0x00 ; e_ident[...] dw 0x0002 ; e_type = ET_EXEC - dw 0x003e ; e_machine = x86-64 - dd 0x00000001 ; e_version yes dq 0x0000000000000000 ; e_entry = stack yes dq 0x0000000000000000 ; e_phoff = 0 - dq 0x0000000000000000 ; e_shoff yes dd 0x00000000 ; e_flags yes dw 0x0040 ; e_ehsize yes dw 0x0038 ; e_phentsize = sizeof (phdr) - db 0x01 ; e_phnum = 1 entry - ------------------------------------------------------------------------------- If we stopped here, we would have a 57-byte ELF64 that the Linux kernel executes correctly, but unfortunately it still fails to execute any user code. And we would really like it to execute something! Otherwise, the executable is more or less useless. So, what can we do with it? First, what is at our disposal (= what does the kernel map for every process by default)? $ gdb -q -ex 'file ./execve_wrapper' -ex 'catch exec' -ex "run ./57-byte_elf_x86-64" ... (gdb) info proc map Start Addr End Addr Size Offset Perms objfile 0x7ffff7ff9000 0x7ffff7ffd000 0x4000 0x0 r--p [vvar] 0x7ffff7ffd000 0x7ffff7fff000 0x2000 0x0 r-xp [vdso] 0x7ffffffde000 0x7ffffffff000 0x21000 0x0 rw-p [stack] Eh, not much. Not surprising, since we didn't load anything into memory. Look where ''e_phoff'' points -- right at the beginning of the binary. That means there is no ''PT_LOAD'' record to load anything into memory. So when the binary is executed, no code from it is actually loaded. We either need to somehow make a ''PT_LOAD'' record that will load our code, or we need to use regions that are mapped by the kernel. (There might be some possibilities in ''PT_INTERP'', but I didn't go there.) ===[ Cursed PT_LOAD ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section is super frustrating for me. I have a long-standing battle with the ''PT_LOAD'' type, and I'm systematically losing. So far, I have been unable to make the smallest ELF64 with a valid ''PT_LOAD''. The smallest size "I" was able to get is a 73-byte, self-contained, fully valid ELF64, and that's only thanks to the tmp.out article by lm978 [ref8] (btw, this is superb work; lm978 was able to make a fully valid 73-byte ELF64 "Hello, world!"; Respect+!). Sadly, when I apply the technique, it adds 16 bytes to the base 57-byte binary, and that version already exists. (It also needs either more than 4 GiB of memory or enabled memory overcommit, which I don't see as a problem.) Well, here it is for completeness' sake, and if you want to know more (as you should), read the OG article [ref8]: BITS 64 ; USABLE? db 0x7F, "ELF" ; e_ident[EI_MAG] - _code: mov dil, 66 ; 40 B7 42 ; return value mov al, 0x3c ; B0 3C ; sys_exit syscall ; 0F 05 db 0x00 ; e_ident[EI_PAD] yes _phdr: db 0x01 ; e_ident[EI_PAD] yes db 0x00 ; e_ident[EI_PAD] yes db 0x00 ; e_ident[EI_PAD] yes db 0x00 ; e_ident[EI_PAD] yes dw 0x0003 ; e_type = executable - dw 0x003e ; e_machine = x86-64 - dd 0x0000000c ; e_version = ELF version yes dq 0x0000000c00000000 ; e_entry - dq _phdr ; e_phoff - dq 0x0038000000000000 ; e_shoff yes dd 0x00000001 ; e_flags yes dw 0x0000 ; e_ehsize yes dw 0x0038 ; e_phentsize - db 0x01 ; e_phnum - ; phdr padding -- explained in the "One Byte Less" section db 0x00 db 0x00 db 0x00 db 0x00 db 0x00 db 0x00 db 0x00 db 0x00 db 0x00 db 0x00 db 0x00 ; jump padding db 0x00 db 0x00 db 0x00 jmp short _code So what other options are there? We already know that ''PT_LOAD'' is needed when we want to load any part of the binary into memory (which is usually the case). But we are under no obligation to load anything from the program. Actually, let's double down! We're edgy kids -- we rebel against such oppressive doctrines as ''PT_LOAD''! It's just so...so crypto-fascist. At the beginning, I said that we need at least one program header record. That's right, *a* record. It doesn't need to be ''PT_LOAD''. It doesn't even need to be one of those the kernel recognizes. (In some cases, it might be better if it's unrecognizable by the kernel, as in the 57-byte example. If it's skipped, it doesn't have side effects, right?) In the previous section, we looked at the memory mapping of the 57-byte program. There was one executable region: ''vdso''. What if we point ''e_entry'' there? there could be some interesting instructions to execute... ===[ vDSO and ROP ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ No! Just NO! There are too many problems with it. (I like vdso, but not in this case. If you've ever played with it, you probably know what I mean.) Virtual Dynamically-linked Shared Object (a.k.a. vDSO) is a small subset of kernel space exported to user space for faster access to some code (like ''gettimeofday'') [ref29]. It's a virtual ELF file (that's good), and unlike its predecessor ''vsyscall'', it was built with ASLR in mind, so its base address changes every run (that's bad). And I don't just mean it changes when ASLR is enabled, that's today's standard. What I mean is that it doesn't even have the same base address on different kernels when ASLR is disabled! That's a very big problem if we want to have at least the illusion of portability. -------[ With ASLR disabled: vDSO changes on different kernel versions ]------- uname -r ; setarch "$(uname -m)" -R grep vdso /proc/self/maps 6.1.0-37-amd64 7ffff7fc8000-7ffff7fca000 r-xp 00000000 00:00 0 [vdso] $ uname -r ; setarch "$(uname -m)" -R grep vdso /proc/self/maps 6.12.73+deb13-amd64 7ffff7fc5000-7ffff7fc7000 r-xp 00000000 00:00 0 [vdso] ------------------------------------------------------------------------------- On top of that, the vDSO structure and code may also (and "often" do) change between major kernel versions. This makes it difficult to use it as a reliable entry point across kernel versions. Well, that sucks! What other techniques can we use? ===[ Sacrificial Alter ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is one technique that would solve our problem, but it requires a lot of outsourcing. I wouldn't care much, except for one thing that bothers me. But let's start from the beginning. What other regions are mapped? Well, well, well. Isn't that the legendary stack?! It is! But it's not for free -- there are two problems. The first one is that the stack is not executable by default since 2020 (kernel >= 5.8). Fortunately, this is not a big issue, as we can define ''PT_GNU_STACK'' in the program header. It wants a 3-byte sacrifice, but it's doable. The second problem, which is more serious, is ASLR once again. We'll deal with it, but in a cowardly way. Let's start with ''PT_GNU_STACK''. On kernels < 5.8, the stack is executable by default, and we don't need to do anything to execute code from the stack. We can point the first program header to the beginning of the file and be done with it (exactly like we did in [[#Summary of 57-byte ELF64]]). The first program header record points to offset 0, which is the same as the ELF magic. The ''phdr->p_type'' is nonsense, so the kernel ignores it, and the stack remains executable. That's nice, as it gives us a 57-byte binary (as shown at the beginning). (Un)fortunately, Kees Cook said: "no fun allowed" in 2020 [ref10], and since kernel version >= 5.8 [ref11], the stack is non-executable by default. Therefore, we need to set ''PT_GNU_STACK'' with the correct permissions (= ''phdr->p_flags'') explicitly. This will add 3 bytes to our binary (but it's still a 60-byte binary, which is fine by me). First, we want to set ''phdr->p_type'' to ''PT_GNU_STACK''. It's defined as ''0x6474e551'' in the kernel source code [ref12]: #define PT_LOOS 0x60000000 /* OS-specific */ #define PT_GNU_STACK (PT_LOOS + 0x474e551) // => 0x6474e551 NOTE: ''PT_GNU_STACK'' exists since Linux 2.6.6 [ref9]. Second, we need to set the executable bit (= ''PF_X'') in ''phdr->p_flags'', the loader checks if it's set: ------------------------[ load_elf_binary: stack mode ]------------------------ for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) switch (elf_ppnt->p_type) { case PT_GNU_STACK: if (elf_ppnt->p_flags & PF_X) executable_stack = EXSTACK_ENABLE_X; else executable_stack = EXSTACK_DISABLE_X; break; ------------------------------------------------------------------------------- NOTE: From the condition above, the loader checks only if ''PF_X'' is set, and the ''PF_X'' flag is defined as 1 [ref26], which means it's the lowest bit => ''p_flags'' can be any odd number, and the stack will be read-write-executable [ref13]. No more fields are required for ''PT_GNU_STACK''. Finally, we have to place it somewhere in the binary. The first eligible place is at the 5th position, right after the ELF magic ''\x7fELF''. There is enough space for both ''phdr->p_type'' (4 bytes) and ''phdr->p_flags'' (4 bytes), and those ELF header fields are not read by the kernel (see the ''REQUIRED'' column): -------------------[ prototype_of_60-byte_elf_x86-64.nasm ]-------------------- BITS 64 ; REQUIRED db 0x7F, "ELF" ; e_ident[EI_MAG] yes phdr: dd 0x6474e551 ; phdr->p_type = PT_GNU_STACK - dd 0x00000001 ; phdr->p_flags = RWX - db 0x00 ; e_ident[EI_PAD] - db 0x00 ; e_ident[EI_PAD] - db 0x00 ; e_ident[EI_PAD] - db 0x00 ; e_ident[EI_PAD] - dw 0x0002 ; e_type = ET_EXEC yes dw 0x003e ; e_machine = x86-64 yes dd 0x00000001 ; e_version - dq 0x0000000000000000 ; e_entry = stack yes dq phdr - $$ ; e_phoff = 0 yes dq 0x0000000000000000 ; e_shoff - dd 0x00000000 ; e_flags - dw 0x0040 ; e_ehsize - dw 0x0038 ; e_phentsize = sizeof (phdr) yes dw 0x0001 ; e_phnum = 1 entry yes dw 0x0000 ; e_shentsize => padding - ------------------------------------------------------------------------------- Right! But it's still not working, because ASLR randomizes the address of the stack, so the loader jumps into an unmmaped memory region and segfautls. ===[ Problem with Personality ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ASLR is a pain (and I didn't figure out how to solve it). When we look at the kernel source code, we can see that there are two ways to disable ASLR on the fly [ref16]. One is global for the whole system through the variable ''randomize_va_space'' (this variable is set when we write to ''/proc/sys/kernel/randomize_va_space'' [ref14] or pass ''norandmaps'' as a kernel boot parameter [ref15]). And the other one is through the personality [ref17] of a process: const int snapshot_randomize_va_space = READ_ONCE(randomize_va_space); if (!(current->personality & ADDR_NO_RANDOMIZE) && snapshot_randomize_va_space) current->flags |= PF_RANDOMIZE; setup_new_exec(bprm); /* Do this so that we can load the interpreter, if need be. We will change some of these later */ retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP), executable_stack); Both are problematic, because they require an action outside of the binary, and that's lame! Regrettably, I have no choice but to invoke the rule that ASLR must be disabled either via ''personality(2)'' or globally. (Shame? Such a mundane thought never crossed my mind!) ===[ Outsourcing Code to the Stack ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's continue our stack adventure. We still have one unsolved problem. We already know that, even though we don't map anything from the binary directly into memory (as we don't have any ''PT_LOAD'' in the binary), we can use the user stack. We can already make it executable, but how can we store data there when we have no code? (The chicken-and-egg problem.) We actually need two things: - a way to push our code onto the stack, and - the address of that code, so we can execute it. There are several options we can use to get our data onto the user stack: 1. number of arguments (= ''argc''), 2. program name and its arguments (= ''argv''), 3. environment variables (= ''env''), 4. filename (= ''auxv[AT_EXECFN]''). Most of them require outside effort, like running the program with special environment variables or special arguments, etc. I don't want any more unnecessary "user interaction" (I still have trauma caused by ASLR). Fortunately, the kernel is kind enough and grants my wish. Number 4, ''filename'', differs from ''argv[0]''. It is created at a program execution. XXX vv It's one of the first records put on the user stack of a process and is derived from ''bprm->filename'' (=> ''bprm->exec'' [ref20]), so it's the path how the program was executed. The kernel puts it on the stack for ''auxv[AT_EXECFN]'' [ref24], which points to it [ref19]. Unlike argc, argv, and envp, it's not a stable ABI and might disappeare in the future, but I doubt that have never seen it missing or in a different possition on x86-64. XXX ^^ Here is a simple layout of the x86-64 stack which is set up by the kernel: ----------------------[ Stack layout (x86-64 ; no ASLR) ]---------------------- (lower memory addess than ) ... auxv AT_EXECFN = ptr @ ----. [ref19] argc | argv | env | filename\0 bprm->exec <---------' [ref20] 0x0000000000000000 8 bytes (sizeof (void *)) [ref21] 0x7FFFFFFFF000 [ref22] [ref23] ------------------------------------------------------------------------------- From the layout, we can see that the most control we have is through the filename, because it ends at a deterministic position -- 8 bytes from the bottom of the stack. This is perfect, as it's a known fixed address, and the bonus is that when we use a negative offset (= from a higher address to a lower one), it automatically trims out any path prefix => only the last part of the filename is relevant. E.g., it doesn't matter if we run ''/bin/cat'' or ''./cat'', the ''cat'' string will always be at position: ''STACK-BOTTOM - 8 - strlen ("cat") -1'' (-1 because of ''\0'' filename string termination). XXX vv Ok, the remaining stack ingredient is the address of STACK-BOTTOM without ASLR. The user stack (re)location is "finalized" in the ELF loader [ref22] and it uses ''STACK_TOP'' as the reference [ref23] for the stack bottom :). That means bottom of the user stack without ASLR is at ''0x7FFFFFFFF000'' on x86-64. XXX ^^ ===[ The Printable Code ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We need one more piece -- runnable code. Let's (not) overcomplicate our lives; we'll create a simple ''exit(2)'' program that exits with value ''66''. We are doing shellcode that will be stored in a file name. On typical Unix filesystems (e.g., ext4, xfs, zfs, tmpfs, ...), there are only two rules for a file name we must obey: 1. no NUL character (''\0''), and 2. no forward slash (''/''). Those are reserved for paths (''/'') and string termination (''\0''). A proof of concept can be something like this code: ----------------------------[ poc_shellcode.nasm ]----------------------------- BITS 64 ; MEANING C hex string mov dil,0x42 ; arg = 66 \x40\xb7\x42 mov al, 0x3c ; syscall exit \xB0\x3C syscall ; exit (66) \x0F\x05 ------------------------------------------------------------------------------- We can use the binary output as the name of our binary, or better yet, we can create a symlink with this name pointing to our binary: ln -s 60-byte_elf_x86-64 $'\x40\xb7\x42\xB0\x3C\x0F\x05' The size is 7, so we need to appropriately set ''ehdr->e_entry'' to ''0x7FFFFFFFF000 - 8 - 7 - 1'', and we can run it: ./$'\x40\xb7\x42\xB0\x3C\x0F\x05' echo $? # => 66 It correctly exits with 66. But that's lame, and we can do better! Well, sir, would you care for a small portion of printable characters in your filename, with a hint of self-modifying code before the main course? Well, I do, my good sir -- I indeed do: -------------------------[ printable_shellcode.nasm ]-------------------------- ; constructing instruction 'syscall' (= 0F 05) sub ax, 0x7270 ; 0x0000 - 0x7a71 = 0x858f sub ax, 0x474f ; 0x858f - 0x4040 = 0x454f sub ax, 0x4132 ; 0x454f - 0x4040 = 0x050f => 0F 05 push rax pop rsi ; rsi = 0F 05 ; constructing syscall value for 'exit' push byte 0x54 pop rax xor al, 0x68 ; eax = 0x3c => syscall exit push byte 66 ; return value pop rdi ; modify the next intruction so it becomes 'syscall' xor word [rel _syscall], si _syscall: dd 0 ; a place holder ------------------------------------------------------------------------------- How does it work? We are very constrained by the number of instructions we can use. Printable ASCII codes range from 0x20 (space) to 0x7e (tilde), and anything outside this range is considered a non-printable/special character. We want only instruction opcodes that are in the printable range. The main goal is the possibility of writing it on a keyboard without problems. The number of eligible instructions on x86-64 is very small, but the set of fully usable instructions with arguments and registers is even smaller. I took a known alphanumeric opcode table [ref27] and the x86-64 opcode table [ref28] and iterated from that. For example, there is no ''mov'', so we have to be creative and use multiple ''sub ax'' (it results in printable opcodes ''f-'') to get the value we want. In the code above, we want to get 0x050f. Both bytes are unprintable. Also, when we use ''sub'', we go backward from zero and underflow to our desired value, and we must use values whose bytes result in printable characters: sub ax, 0x7270 ; f-pr sub ax, 0x474f ; f-OG sub ax, 0x4132 ; f-2A Then we load it into ''rsi'', because we need it there for the last instruction: ''xor word [rel _syscall], si''. This instruction is a real treat; it modifies the code after it, but it's not that simple. Look at how the instruction really looks: xor [RIP + 0x7], si ; 66 31 35 00 00 00 00 Yeah, correct! Those are the infamous NUL bytes. The exact ones that are forbidden. And moreover, there are four of them. How do we get out of this predicament? Well, do you remember the stack layout from [[#Outsourcing Code tothe Stack]]? It looks like this: XXX ^^ formating ... filename\0 0x0000000000000000 8 bytes (sizeof (void *)) 0x7FFFFFFFF000 We have 8 NUL bytes from ''sizeof (void *)'', plus 1 NUL byte from the filename termination at our disposal. It would be a waste not to use them... We can take the ''xor'' instruction and trim its tail of NUL bytes, and when executed, they'll be there. And if you are asking why I didn't simply use ''push 0x3c'', as it's also a printable and valid file name character. I could, but ''0x3c'' is the ''<'' character, which is used as file descriptor 0 redirection in the shell. We could run it in quotes or escape it (''\<'')), and it would run fine, but why not make it sexier when we can? Am I right, boiz?! When we build the shellcode, we get these 25 printable characters: f-prf-OGf-2AP^jTX4hjB_f15 This can be used as a filename without any worries (it could be made prettier, but that's a homework for a good reader). ===[ 60-byte Frankenstein's Monster ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now we've got every part we need: - the trimmed ELF structure, - the executable stack for kernel versions >= 5.8, - the address of the stack, - the code we want to run, - the place to store the code, - the printable filename. At last, let's put it all together. Prepare one big duct tape. We'll take the prototype from [[#Too Much to Sacrifice]] and fill in the entry point (''ehdr->e_entry''). Our code is stored in the file name, and that is stored on the stack near its bottom. The code length is 25 bytes, and when we plug it into the equation from [[#Outsourcing Code to the Stack]], we get: e_entry = STACK-BOTTOM - 8 - code_length - 1 = = 0x7FFFFFFFF000 - 8 - 1 - 25 = = 0X7FFFFFFFEFDE NOTE: nasm allows for simple equations, so I just rearranged it and kept it like that, because then it can be easily edited without recalculating it again. The final code will look like this: --------------------------[ 60-byte_elf_x86-64.nasm ]-------------------------- BITS 64 ; REQUIRED db 0x7F, "ELF" ; e_ident[EI_MAG] yes phdr: dd 0x6474e551 ; phdr->p_type = PT_GNU_STACK - dd 0x00000001 ; phdr->p_flags = RWX - db 0x00 ; e_ident[EI_PAD] - db 0x00 ; e_ident[EI_PAD] - db 0x00 ; e_ident[EI_PAD] - db 0x00 ; e_ident[EI_PAD] - dw 0x0002 ; e_type = ET_EXEC yes dw 0x003e ; e_machine = x86-64 yes dd 0x00000001 ; e_version - dq 0x7FFFFFFFF000-8-1 -25 ; e_entry = stack yes dq phdr - $$ ; e_phoff = 0 yes dq 0x0000000000000000 ; e_shoff - dd 0x00000000 ; e_flags - dw 0x0040 ; e_ehsize - dw 0x0038 ; e_phentsize = sizeof (phdr) yes dw 0x0001 ; e_phnum = 1 entry yes dw 0x0000 ; e_shentsize => padding - ------------------------------------------------------------------------------- Build it, make it executable, and marvel at our glorious hexdump: nasm -f bin 60-byte_elf_x86-64.nasm -o 60-byte_elf_x86-64 chmod 755 60-byte_elf_x86-64 xxd 60-byte_elf_x86-64 --------------------------[ 60-byte_elf_x86-64.xxd ]--------------------------- 00000000: 7f45 4c46 51e5 7464 0100 0000 0000 0000 .ELFQ.td........ 00000010: 0200 3e00 0100 0000 deef ffff ff7f 0000 ..>............. 00000020: 0400 0000 0000 0000 0000 0000 0000 0000 ................ 00000030: 0000 0000 4000 3800 0100 0000 ....@.8..... ------------------------------------------------------------------------------- NOTE: Conversion from xxd hexdump to binary: xxd -r 60-byte_elf_x86-64.xxd > 60-byte_elf_x86-64 Now, let's implant the code into the file name. It's a file name, so we'll use a symlink: ln -s 60-byte_elf_x86-64 f-prf-OGf-2AP^jTX4hjB_f15 Disable ASLR and run it: echo 0 > /proc/sys/kernel/randomize_va_space ./f-prf-OGf-2AP^jTX4hjB_f15 echo $? # => exits with 66 NOTE: If you don't want to disable ASLR for the whole system, it can be disabled per program using a tool from util-linux [ref18]: ''setarch "$(uname -m)" -R -- ./f-prf-OGf-2AP^jTX4hjB_f15'' (it sets the ''ADDR_NO_RANDOMIZE'' personality flag). NOTE: If we run a kernel older than 5.8, we could instead use the ''57-byte_elf_x86-64'' binary ([[#Summary of 57-byte ELF64]]), fill in the correct ''e_entry'', and then it would run the same as the 60-byte binary. ===[ Is This a Happy Ending or a Sad Ending? ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ So, kids, what have we learned? We found out that a Linux executable program needs two headers: the ELF header and the program header. The program header needs to exist, but it doesn't need to contain a valid record. The ELF header can be trimmed down to 57 bytes, as it seems to be the limit of what the kernel is willing to tolerate for an x86-64 binary to be executed. It's "limited" by the non-zero value in the number of program header records (''ehdr->e_phnum''). The kernel does not need ''PT_LOAD'' to correctly load and execute a program, but it wants at least one program header record. This limits the file size to ''ehdr->e_phoff + sizeof (Elf64_Phdr)'', as those records are read from the file and the loader checks the size of every record it reads (= it must match ''sizeof (Elf64_Phdr)''). If we have an executable user stack, we can store the program instructions in a filename, which is then stored on the stack by the kernel and can be used as the entry point (''ehdr->e_entry''). And that's all I have, guys. Be kind, and HACK THE PLANET! as they're trashing our rights, man! They're trashing the flow of data! .---------------------------------. ,,-. / \ ..( \ | Marge, I'm confused. Is this | ( / | a happy ending or a sad ending? | ( ) / / ( ) //''---------------------------------' ( / _//\_ / ( ) / \ .------------------------------. ( ( | | / \ ( ~ ~ .. ) | (.)(.) | It's an ending, that's enough. | (.)(.) ( ) C _---_) \ \ _- C) | | __| '------------------------------''\\ (__ | | \__/ \ '--- | /___ | \ | /____\ / OoooO | \ / \ ===[ References ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > [ref1] https://research.h4x.cz/html/2025/2025-10-06--touching_small_elfs-p2-segfaults_everywhere.html > [ref2] https://research.h4x.cz/html/2025/2025-10-24--touching_small_elfs-p3-broken_time-machine.html#bug_4_chefs_kiss > [ref3] https://research.h4x.cz/html/2025/2025-09-11--touching_small_elfs-p1-broken_tools.html > [ref4] https://elixir.bootlin.com/linux/v6.12.57/source/fs/binfmt_elf.c > [ref5] https://elixir.bootlin.com/linux/v6.12.57/source/fs/binfmt_elf.c#L819 > [ref6] https://en.wikipedia.org/wiki/Endianness > [ref7] https://www.man7.org/linux/man-pages/man5/elf.5.html > [ref8] https://tmpout.sh/3/22.html > [ref9] https://www.kernel.org/doc/Documentation/userspace-api/ELF.rst > [ref10] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/arch/x86/include/asm/elf.h?h=v5.8&id=122306117afe4ba202b5e57c61dfbeffc5c41387 > [ref11] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/include/asm/elf.h?h=v5.8#n282 > [ref12] https://elixir.bootlin.com/linux/v6.12.57/source/include/uapi/linux/elf.h#L39 > [ref13] https://elixir.bootlin.com/linux/v6.12.57/source/fs/binfmt_elf.c#L926 > [ref14] https://elixir.bootlin.com/linux/v6.12.57/source/kernel/sysctl.c#L1904 > [ref15] https://elixir.bootlin.com/linux/v6.12.57/source/mm/memory.c#L160 > [ref16] https://elixir.bootlin.com/linux/v6.12.57/source/fs/binfmt_elf.c#L1007 > [ref17] https://www.man7.org/linux/man-pages/man2/personality.2.html > [ref18] https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/sys-utils/setarch.c > [ref19] https://elixir.bootlin.com/linux/v6.12.57/source/fs/binfmt_elf.c#L261 > [ref20] https://elixir.bootlin.com/linux/v6.12.57/source/fs/exec.c#L1959 > [ref21] https://elixir.bootlin.com/linux/v6.12.57/source/fs/exec.c#L295 > [ref22] https://elixir.bootlin.com/linux/v6.12.57/source/fs/binfmt_elf.c#L1015 > [ref23] https://elixir.bootlin.com/linux/v6.12.57/source/arch/x86/include/asm/page_64_types.h#L80 > [ref24] https://www.man7.org/linux/man-pages/man3/getauxval.3.html > [ref25] https://www.nasm.us/doc/nasmdoci.html > [ref26] https://elixir.bootlin.com/linux/v6.12.57/source/include/uapi/linux/elf.h#L247 > [ref27] https://dl.packetstormsecurity.net/papers/shellcode/alpha.pdf > [ref28] http://ref.x86asm.net/geek64.html > [ref29] https://www.man7.org/linux/man-pages/man7/vdso.7.html