.-----------------------------------------------------------------------------.
| Ghetto Reversing, Hacking, Patching and Incidentally Fixing a Bug in FBReader |
'-----------------------------------------------------------------------------'
updated: 2024-09-27
Learning objectives:
- Reverse engineering Linux ELF binaries using 'objdump'.
- Finding 'dlopen(3)' calls using 'strace'.
- Crafting a patch using 'nasm'.
- Binary patching with 'dd', including potential pitfalls.
- Some notes on symbols and (C++) name manglings.
- Inspecting the binary representation of a C++ constructor.
.::::::::.
::::::::::::.
:::::::::::::
::::::::::::
::::::::::'
:::::::::'
:::'''::::::.
::::::::::
000000 .:. ::::::::::
00000000000 ':::. ::-::::::::
00000000000000 ':::::':::::::::
0000000000000000 '' :::::::::
10000000000000000 ::::::::::
11000000000000000 :::::::::::
11100000000000000 .:::::::::::.
11110000000000000 :::::::::::'
11111000000000000 .::::::::'
11111100000000000 ':: ::
11111110000000000 .:: ::: ________
11111111000000000 ::_____':: %#
11111111000000000 ________ :: % ':: %#/
11111111100000000 ______/ %#/ :: #/ .::: %#/
11111111110000000 _____ / %#/ %#/ .:::' .:::%#/ %
11111111111000000 _____ / %#/ %#/ %#/ "./ .%#/ %#/
11111111111000000 _____ / .%#/ .%#/ %#/ .%#/ %#/ %#/
1111111111100000 %#/ %#/ .%#/ .%#/ .%#/ %#/ .%#/ %#/
1111111111100000 %#/ %#/ .%#/ %#/ .%#/ .%#/ %#/ %#/
11111111110000 %#/ %#/ .%#/ .%#/ %#/ .%#/ .%#/ %#/
1111111110000 #/ %#/ %#/ %#/ %#/ %#/ %#/ %#/
Debian FBReader has been a persistent thorn in my side. Its longstanding bug
causes hyphens to appear after each word, not just within ebook texts, but also
in the status bar and help pages:
The first reasonable step is always to start with research, it could be a known
bug and someone has already found a solution. If we search for ''fbreader
hyphens bug'', we'll find that there is a relevant bug report from 2022 [ref1]:
Most of the time, fbreader draws hyphens after each word in any book including
its built-in help page. IIRC this started after switching to GTK. I cannot
find any useful patterns in reproducibility, or guess why does it happen (I
also tried looking at lsof and the config dir and check and change the app
settings). I can reproduce it on two testing/sid systems, including one with
an empty config.
Luckily for us, in the same thread there is a solution to the problem from
Siarhei Abmiotka [ref2] (good work!). The fix is straightforward: we just need
to add the missing initialization of 'flags'. Here is the complete patch:
-----------------------------------[ patch ]-----------------------------------
--- fbreader.orig/zlibrary/ui/src/gtk/view/ZLGtkPaintContext.cpp
+++ fbreader/zlibrary/ui/src/gtk/view/ZLGtkPaintContext.cpp
@@ -54,6 +54,7 @@ ZLGtkPaintContext::ZLGtkPaintContext() {
myFontDescription = 0;
myAnalysis.lang_engine = 0;
myAnalysis.level = 0;
+ myAnalysis.flags = 0;
myAnalysis.language = 0;
myAnalysis.extra_attrs = 0;
myString = pango_glyph_string_new();
-------------------------------------------------------------------------------
The primary issue is that the instantiation of 'myAnalysis' is not
initialized with zeros by default, resulting in anything not explicitly set to
a value containing "random" garbage. In 'flags', this leads to "the
hyphening" being frequently activated.
We could end here and patch the source file, build it, and start using the
fixed version of FBReader. But going through the FBReader's build process is a
tedious journey of failing dependencies and fixing broken code. I propose a far
more entertaining approach. What if we patch the binary? It's just one
initialization, so how hard could it be?
Before we delve into binary hacking, we need to know a few things about
symbols. Simply put, a symbol is an alias for an address in a binary program.
For instance, if the 'printf' function starts at address '0x12540', when
the linker encounters a reference to the 'printf' symbol, it knows exactly
where to fix the call -- at that very address '0x12540'.
In ELF binaries, symbols are stored in simple tables containing properties such
as name, offset, size, and more. During the build process, a program typically
uses one symbol table. This table is often removed (stripped) after the build
to reduce the binary's size (this operation is typically done by the 'strip'
command when build finishes).
In addition to the symbol table used for build, a library typically has a
second symbol table that contains information about the symbols it exports.
This table is crucial for the dynamic linker when loading a binary into memory,
as it enables the linker to resolve external references. For instance, the
dynamic symbol table of the C standard library (libc) contains symbols such as
'printf'. When mapping out a program and its libraries into memory, the
linker also populates special tables such as PLT (Procedure Linkage Table) and
GOT (Global Offset Table) with the addresses of the symbols referenced in the
code. This enables the code to know where to jump when calling a symbol. The
dynamic symbol table is essential for working libraries and is almost never
stripped (btw, it should be a red flag when the table is missing).
(ELF structure, relocations, linking and loading are beyond the scope of this
article. For those interested in learning more, please refer to [ref3]
[ref4] [ref5] for further details.)
Let's get practical and explore the exported symbols from libc using
'objdump' from the 'binutils' package [ref6]. ('objdump' is an
incredibly useful tool that lets us analyze symbols and disassemble code. This
makes it surprisingly a great choice for quick reverse engineering on the Linux
command line.)
$ objdump --dynamic-syms /lib/x86_64-linux-gnu/libc.so.6
...
000000000007dbf0 w DF .text 000000000000010f GLIBC_2.2.5 fgetc
000000000009ba30 g DF .text 0000000000000073 GLIBC_2.2.5 envz_strip
^ ^ ^ ^ ^
| | | | |
offset symbol type section size symbol name
Most of the time, all we need are symbol names. The extra information, like
offset and size, are useful when manually navigating through a binary, but we
don't need it here, as we'll be working exclusively with symbols.
(For more details, please refer to the description of 'objdump --syms' in
[ref6], the documentation is actually pretty good.)
Some programming languages, such as C++, Go, and Rust, encode names of
functions, classes, structures, etc. because symbol names might collide. This
process is called name mangling. A great example of name collision is function
overloading in C++. Let's look at this simple example:
---------------------------------[ test.cpp ]----------------------------------
int func () { return 1; }
int func (int i, char c) { return 2; }
int main (int i, char **a) { return 0; }
-------------------------------------------------------------------------------
If compilers didn't perform name mangling, code would generate two symbols with
the same name, 'func', but with completely different behavior. The linker
would be utterly confused about which function to use in each object file.
Therefore, the compiler takes all function definitions, including their
arguments and return values, and encodes them so that they have a unique name.
When we compile the source code above, we get a binary with two mangled
symbols:
$ g++ test.cpp -o ./test
$ objdump --syms ./test | grep 'func'
0000000000001129 g F .text 000000000000000b _Z4funcv
0000000000001134 g F .text 0000000000000011 _Z4funcic
As we can see, the symbols are unique but strange-looking. We can still see the
name 'func' but it also has mysterious prefixes and suffixes. Name mangling
follows certain rules [ref7], although unfortunately, it is not standardized
(intentionally [ref8]) and the output can differ even between compilers of the
same language (see [ref9] for different C++ mangling outputs). Here is a
sample illustration of the GNU C compiler's symbol mangling:
.-- mangling prefix '_Z'
| .-- func name .---- complete object constructor 'C1'
| | | .--- no arguments (void)
| | /| |
v v vv v
_ZN17ZLGtkPaintContextC1Ev --> ZLGtkPaintContext::ZLGtkPaintContext()
^^ ^
|| '-- end of nested-name
|'--- name length '17'
'--- beginning of nested-name
Like most reverse engineering tools, objdump can automatically demangle
symbols. This is super cool! Remember the patch from the beginning? From it, we
know that the source code is written in C++ (the '.cpp' suffix is a good
indicator). That means the symbol names are mangled.
Before we use objdump on fbreader, let's see how the output looks for a C++
library like 'libstdc++.so':
1. Raw symbol names:
$ objdump --dynamic-syms /lib/x86_64-linux-gnu/libstdc++.so.6 \
| grep -m 4 '\.text' \
| sed -r 's/^.*GLIBC[^ ]+ //'
_ZNKSbIwSt11char_traitsIwESaIwEE8capacityEv
_ZNSsC1Ev
_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE10_M_disposeEv
_ZNSt9money_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEED0Ev
2. Demangled symbol names:
$ objdump --dynamic-syms --demangle /lib/x86_64-linux-gnu/libstdc++.so.6 \
| grep -m 4 '\.text' \
| sed -r 's/^.*GLIBC[^ ]+ //'
std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::capacity() const
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string()
std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_M_dispose()
std::money_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::~money_get()
Excellent! Now we can look for the exact name, which in our case would be
'ZLGtkPaintContext::ZLGtkPaintContext()'. So let's finally find that damn
symbol and its code!
(NOTE: We could try to encode the name by hand and search for the mangled
symbol, but it's hard to do it correctly. For example, the symbol name
'ZLGtkPaintContext::ZLGtkPaintContext()' will probably be encoded by GCC as
'_ZN17ZLGtkPaintContextC1Ev', but we don't want to search for it like that,
as it can result in a false negative if the name is encoded differently.)
The typical places where the symbol could be located are:
1. The 'fbreader' binary itself.
2. Libraries that the 'fbreader' binary uses.
3. Plugins that are loaded on-the-fly by mechanisms like 'dlopen' [ref10].
(Btw, these plugins are just shared objects/libraries that haven't been loaded
yet when the program starts.)
Let's see what's waiting for us in the 'fbreader' binary:
$ objdump --syms --demangle /usr/bin/fbreader
SYMBOL TABLE:
no symbols
Well, if it were that easy, I wouldn't be writing this article. If you're
familiar with Linux binaries, this isn't surprising -- Linux distributions'
binaries are typically stripped of symbols, after all. Before we fall into
the rabbit hole of searching for patterns in the fbreader binary, let's take a
closer look at the libraries involved (especially since the patch is in a path
starting with 'fbreader/zlibrary/...').
One way to figure out what libraries fbreader depends on is to use the 'ldd'
command. This utility reads the dynamic section of an ELF binary and
recursively determines all the linked libraries:
$ ldd /usr/bin/fbreader
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fcf1f200000)
libzltext.so.0.13 => /lib/libzltext.so.0.13 (0x00007fcf1f6ac000)
libzlcore.so.0.13 => /lib/libzlcore.so.0.13 (0x00007fcf1f5eb000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcf1f01f000)
...
Let's make a one-liner that reads all the libraries, lists their symbols, and
searches for the 'ZLGtkPaintContext' symbol:
$ ldd /usr/bin/fbreader \
| awk '{print $3}' \
| xargs -I {} objdump --demangle --dynamic-syms {} \
| grep ZLGtkPaintContext
Absolutely muffin! What the hack?!
It is still possible that the symbol is loaded dynamically via 'dlopen'. The
'dlopen(3)' function [ref10] is often used to load dynamic shared objects
(aka shared library, aka plugins, aka addons).
The easiest way to determine if a program uses 'dlopen' is to 'strace' it.
We won't directly see the 'dlopen' call, as it's not a syscall, but rather a
function from the 'libc.so' library (historically, it was within the dynamic
linker library 'libdl.so'). What 'dlopen' does is open a library (shared
object), map it to memory, and initialize it. Therefore, if we strace such a
program, we should see two syscalls: 'open(2)' and 'mmap(2)'. Knowing this,
we can trace only for the 'open(2)' syscalls ('-e trace=open,openat') and
ask for a stack trace ('-k'). This should give us a clue if it's indeed an
instance of 'dlopen' being called:
$ strace -f -e trace=open,openat -k -o ./strace.out /usr/bin/fbreader
$ less ./strace.out
...
openat(AT_FDCWD, "/usr/lib/zlibrary/ui/zlui-gtk.so", O_RDONLY|O_CLOEXEC) = 3
...
> /usr/lib/x86_64-linux-gnu/libc.so.6(dlopen+0x69) [0x854e9]
...
> /usr/bin/FBReader() [0x344da]
...
NOTE: Ironically, FBReader actually printed this to the console: ''loading
/usr/lib/zlibrary/ui/zlui-gtk.so''.
Let's investigate further by searching for the desired
'ZLGtkPaintContext::ZLGtkPaintContext' symbol within the 'zlui-gtk.so'
shared object:
(Remember, the symbol names reside in the ELF's dynamic section. And the binary
was compiled with a C++ compiler, so we'll need to demangle all symbols to
identify the correct one.)
$ objdump --demangle --dynamic-syms /usr/lib/zlibrary/ui/zlui-gtk.so \
| grep -o '\<ZLGtkPaintContext::ZLGtkPaintContext.*'
ZLGtkPaintContext::ZLGtkPaintContext()
Yes! YES! Now, show us the code:
$ objdump -M intel --disassemble='ZLGtkPaintContext::ZLGtkPaintContext()' \
--demangle /usr/lib/zlibrary/ui/zlui-gtk.so
0000000000026b90 <ZLGtkPaintContext::ZLGtkPaintContext()@@Base>:
26b90: 53 push rbx
26b91: 48 89 fb mov rbx,rdi
26b94: e8 77 33 ff ff call 19f10 <ZLPaintContext::ZLPaintContext()@plt>
...
26c5c: c3 ret
Great! Now we know *where* to hack, next we need to find out *what* exactly we
are hacking.
Let's quickly recap the patch:
@@ -54,6 +54,7 @@ ZLGtkPaintContext::ZLGtkPaintContext() {
+ myAnalysis.flags = 0;
What we ultimately aim to achieve is initializing 'myAnalysis.flags' to zero.
To do so, we need to find a location where we can access the 'myAnalysis'
object. In C++ classes, this typically occurs in the constructor, as it's where
member objects are initialized. (This conclusion is also supported by the
patch, which suggests that both the object name and method names are
identical.)
Let's look at the code of the 'ZLGtkPaintContext::ZLGtkPaintContext' symbol
(the constructor):
---------------[ ZLGtkPaintContext::ZLGtkPaintContext()@@Base ]----------------
0000000000026b90 <ZLGtkPaintContext::ZLGtkPaintContext()@@Base>:
26b90: 53 push rbx
26b91: 48 89 fb mov rbx,rdi
26b94: e8 77 33 ff ff call 19f10 <ZLPaintContext::ZLPaintContext()@plt>
26b99: 48 8b 05 68 03 01 00 mov rax,QWORD PTR [rip+0x10368]
26ba0: c6 83 8a 00 00 00 00 mov BYTE PTR [rbx+0x8a],0x0
26ba7: 48 c7 83 a0 00 00 00 mov QWORD PTR [rbx+0xa0],0x0
26bae: 00 00 00 00
26bb2: 48 83 c0 10 add rax,0x10
26bb6: 48 c7 43 20 00 00 00 mov QWORD PTR [rbx+0x20],0x0
26bbd: 00
26bbe: 48 89 03 mov QWORD PTR [rbx],rax
26bc1: 31 c0 xor eax,eax
26bc3: 66 89 83 88 00 00 00 mov WORD PTR [rbx+0x88],ax
26bca: 48 c7 83 a8 00 00 00 mov QWORD PTR [rbx+0xa8],0x0
26bd1: 00 00 00 00
26bd5: 48 c7 83 b0 00 00 00 mov QWORD PTR [rbx+0xb0],0x0
26bdc: 00 00 00 00
26be0: 48 c7 43 28 00 00 00 mov QWORD PTR [rbx+0x28],0x0
26be7: 00
26be8: 48 c7 43 30 00 00 00 mov QWORD PTR [rbx+0x30],0x0
26bef: 00
26bf0: 48 c7 43 38 00 00 00 mov QWORD PTR [rbx+0x38],0x0
26bf7: 00
26bf8: 48 c7 43 48 00 00 00 mov QWORD PTR [rbx+0x48],0x0
26bff: 00
26c00: c6 43 58 00 mov BYTE PTR [rbx+0x58],0x0
26c04: 48 c7 43 60 00 00 00 mov QWORD PTR [rbx+0x60],0x0
26c0b: 00
26c0c: 48 c7 43 68 00 00 00 mov QWORD PTR [rbx+0x68],0x0
26c13: 00
26c14: e8 d7 2b ff ff call 197f0 <pango_glyph_string_new@plt>
26c19: 48 c7 43 78 00 00 00 mov QWORD PTR [rbx+0x78],0x0
26c20: 00
26c21: 48 89 43 70 mov QWORD PTR [rbx+0x70],rax
26c25: 48 c7 83 80 00 00 00 mov QWORD PTR [rbx+0x80],0x0
26c2c: 00 00 00 00
26c30: 48 c7 83 90 00 00 00 mov QWORD PTR [rbx+0x90],0x0
26c37: 00 00 00 00
26c3b: 48 c7 83 98 00 00 00 mov QWORD PTR [rbx+0x98],0x0
26c42: 00 00 00 00
26c46: 48 c7 83 b8 00 00 00 mov QWORD PTR [rbx+0xb8],0xffffffffffffffff
26c4d: ff ff ff ff
26c51: c7 83 c0 00 00 00 00 mov DWORD PTR [rbx+0xc0],0x0
26c58: 00 00 00
26c5b: 5b pop rbx
26c5c: c3 ret
-------------------------------------------------------------------------------
Nice! And the code isn't too large, we can easily analyze it. At first glance,
there is an emerging pattern: most instructions set zero to memory locations at
'rbx' plus some offset. This looks like the initialization we're searching
for. Now, we need to identify where the 'myAnalysis' instance begins...
Alternatively, we could find all missing initializations and set them to zero.
(I believe it's safe to assume that fbreader is not the OpenSSH [ref11] [ref12]
and that we can safely initialize all objects in the constructor.)
Locating uninitialized bytes is kinda straightforward. To begin, we collect all
'mov' instructions that reference 'rbx', then sort them by offset to
identify any gaps in the sequence. By considering the size of each data type,
we can pinpoint which offset is missing.
Before we start, here is a quick data type reference 'objdump' is using:
NAME NUM_BYTES
-----------------
QWORD 8
DWORD 4
WORD 2
BYTE 1
Let's run the following command, which lists all 'rbx' (i.e., the 'this->'
keyword) assignments, and construct a table with missing initializations:
objdump -M intel --disassemble='ZLGtkPaintContext::ZLGtkPaintContext()' \
--demangle /usr/lib/zlibrary/ui/zlui-gtk.so \
| grep -Eo '\<[A-Z]+ PTR \[rbx.*' \
| sort -t + -k 2 \
| vim -
-----------------------[ Table: Missing initialization ]-----------------------
QWORD PTR [rbx],rax
# 24 uninitialized bytes
QWORD PTR [rbx+0x20],0x0
QWORD PTR [rbx+0x28],0x0
QWORD PTR [rbx+0x30],0x0
QWORD PTR [rbx+0x38],0x0
# 8 uninitialized bytes
QWORD PTR [rbx+0x48],0x0
# 8 uninitialized bytes
BYTE PTR [rbx+0x58],0x0
# 7 uninitialized bytes
QWORD PTR [rbx+0x60],0x0
QWORD PTR [rbx+0x68],0x0
QWORD PTR [rbx+0x70],rax
QWORD PTR [rbx+0x78],0x0
QWORD PTR [rbx+0x80],0x0
WORD PTR [rbx+0x88],ax
BYTE PTR [rbx+0x8a],0x0
# 5 uninitialized bytes
QWORD PTR [rbx+0x90],0x0
QWORD PTR [rbx+0x98],0x0
QWORD PTR [rbx+0xa0],0x0
QWORD PTR [rbx+0xa8],0x0
QWORD PTR [rbx+0xb0],0x0
QWORD PTR [rbx+0xb8],0xffffffffffffffff
DWORD PTR [rbx+0xc0],0x0
# (foreshadowing: 4 uninitialized bytes)
-------------------------------------------------------------------------------
Cool, cool. Most gaps can be computed simply by subtracting the offsets.
However, there remains one particular gap: how many bytes are after the last
'mov DWORD PTR [rbx+0xc0],0x0' instruction? This could be either zero or
multiple bytes. The answer to this question lies in identifying the class
instantiation (allocation).
If we search for 'ZLGtkPaintContext::ZLGtkPaintContext' and examine all
occurrences, we'll likely find the instantiation of the class. In my case, it
is in the 'ZLGtkLibraryImplementation::createContext' method:
0000000000024620 <ZLGtkLibraryImplementation::createContext()@@Base>:
24620: 41 54 push r12
24622: bf c8 00 00 00 mov edi,0xc8
24627: e8 d4 59 ff ff call 1a000 <operator new(unsigned long)@plt>
2462c: 49 89 c4 mov r12,rax
2462f: 48 89 c7 mov rdi,rax
24632: e8 39 5f ff ff call 1a570 <ZLGtkPaintContext::ZLGtkPaintContext()@plt>
24637: 4c 89 e0 mov rax,r12
2463a: 41 5c pop r12
2463c: c3 ret
The function initially allocates '0xc8' bytes, and the 'operator new' will
return a pointer to heap memory. Subsequently, it calls the
'ZLGtkPaintContext::ZLGtkPaintContext()' class constructor with the newly
allocated memory pointer as its first argument. (NOTE: In C++, the first
argument of any class method is always the address of the current object,
denoted by the keyword 'this'.)
Now that we know the 'ZLGtkPaintContext' class uses overall 0xc8 bytes, and
the last initialized entry is at affset 0xc0 (which is 'DWORD', or 4 bytes),
ww can perform simple algebra to determine how many potentially uninitialized
bytes are after 'rbx+0xc0':
overall_size - last_rbx_offset + last_rbx_datatype_len =
= 0xc8 - 0xc0 + 4 =
= 4
Here is the table summarizing the intervals for the "missing" offsets:
<start ; end) DATA TYPES
0x08 - 0x20 3 * QWORD
0x40 - 0x48 1 * QWORD
0x50 - 0x58 1 * QWORD
0x59 - 0x60 7 * BYTE = 1 * DWORD + 1 * WORD + 1 BYTE
0x8b - 0x90 5 * BYTE = 1 * DWORD + 1 BYTE
0xc2 - 0xc8 1 * DWORD
Currently, we know what needs to be patched, but we still need to determine the
exact location and what code to add. We can't simply append our code at the end
of the constructor. (In fact, in my build, it might be possible to inject code
just before the function ends, as there are sufficient padding bytes available.
So, we could initialize the 'flags' there. Alternatively, we could create a
trampoline [ref13] and perform the initialization elsewhere. However, let's not
do that, as there is a simpler solution.)
A notable aspect of the 'ZLGtkPaintContext::ZLGtkPaintContext' binary code is
its use of wide instructions for setting values to zero. For example, look at
the byte difference between these line pairs:
--------------------------[ instruction_width.nasm ]---------------------------
BITS 64
mov qword [rbx+0x20],0x0
mov [rbx+0x20],rax
mov dword [rbx+0x20],0x0
mov [rbx+0x20],eax
mov word [rbx+0x20],0x0
mov [rbx+0x20],ax
mov byte [rbx+0x20],0x0
mov [rbx+0x20],al
-------------------------------------------------------------------------------
(NOTE: NASM does not recognize the 'PTR' keyword, as seen in objdump. Be sure
to omit it when rewriting the code using NASM.)
As we assemble the code, we can then see the opcodes and compare their
respective sizes:
$ nasm -O 0 -f bin instruction_width.nasm -o instruction_width.bin
$ objdump -b binary -M intel-mnemonic,x86-64 -m i386 -D instruction_width.bin
0: 48 c7 43 20 00 00 00 00 mov QWORD PTR [rbx+0x20],0x0
8: 48 89 43 20 mov QWORD PTR [rbx+0x20],rax ; diff: 4
c: c7 43 20 00 00 00 00 mov DWORD PTR [rbx+0x20],0x0
13: 89 43 20 mov DWORD PTR [rbx+0x20],eax ; diff: 4
16: 66 c7 43 20 00 00 mov WORD PTR [rbx+0x20],0x0
1c: 66 89 43 20 mov WORD PTR [rbx+0x20],ax ; diff: 2
20: c6 43 20 00 mov BYTE PTR [rbx+0x20],0x0
24: 88 43 20 mov BYTE PTR [rbx+0x20],al ; diff: 1
As we examine the assembled code, we can immediately see that using registers
instead of literal values makes a significant difference (4 bytes for both
'QWORD' and 'DWORD' instructions)! By rewriting the code in this manner,
we'll gain an enormous amount of extra space to work with. In fact, we'll have
so many spare bytes that we might need to pad some of them with NOPs to prevent
the code from crashing.
It's important to note that there are two instances where non-zero values are
being assigned:
call 197f0 <pango_glyph_string_new@plt>
...
QWORD PTR [rbx+0x70],rax ; 'rax' was set by the call
QWORD PTR [rbx+0xb8],0xffffffffffffffff
Rewrites could look like this:
--------------------------[ patch_protoype_01.nasm ]---------------------------
xor eax, eax ; zero out RAX
mov QWORD [rbx+0x08], rax ; offset 0x08-0x10
mov QWORD [rbx+0x10], rax ; offset 0x10-0x18
mov QWORD [rbx+0x18], rax ; offset 0x18-0x20
mov QWORD [rbx+0x40], rax ; offset 0x40-0x48
mov QWORD [rbx+0x50], rax ; offset 0x50-0x58
mov QWORD [rbx+0x58], rax ; offset 0x58-0x60 <-- [1]
; call 197f0 <pango_glyph_string_new@plt>
; WORD PTR [rbx+0x88],ax
xor eax, eax ; zero out RAX again, it might contain a value
mov DWORD [rbx+0x8a], rax ; offset 0x8a-0x90 <-- [2]
mov QWORD [rbx+0xc0], rax ; offset 0xc0-0xc8 <-- [1]
-------------------------------------------------------------------------------
In certain situations, we can combine multiple values into a single instruction
that utilizes a large bit range, as demonstrated in [1]. However, in some
cases, like in [2], we need be careful, and avoid writing memory before it
('[rbx+0x88]') because it holds the return value of a function ('ax').
And we can take it even further by leveraging features of the x86 instruction
set and the original code to our advantage. We could prototype a kind of loop
that will effectively zero out the entire allocated memory. The only concern we
should have is ensuring that we avoid those non-zero assignments we previously
discovered.
I have something like this in mind:
--------------------------[ patch_protoype_02.nasm ]---------------------------
BITS 64
push rcx ; Save all registers we are working with
push rdi
push rax
mov ecx, 0xc0 / 8 ; Counter: size of memory divided by QWORD length
xor eax, eax ; Zero out RAX, we'll use it as the zero value
mov rdi, rbx ; RDI is a memory pointer 'stosq' is writing to
rep stosq ; Rewrite memory at 'RDI+RCX' with value of 'RAX'
pop rax ; Restore all registers we where working with
pop rdi
pop rcx
-------------------------------------------------------------------------------
NOTE: 'stosq' stores quadword from RAX at address RDI [ref14] (in our
example, this means 8 bytes of zeroes). 'rep' is an instruction prefix that
repeats a string instruction the number of times specified in RCX ('rep' can
be used only with specific instructions -- see [ref15]).
When built, the binary code has a size of 19 bytes:
$ nasm -O 0 -f bin patch_protoype_01.nasm -o patch_protoype_01
$ wc -c < a.bin
19
However, this isn't good enough for me. I'd like to place it near the beginning
(specifically at the address '26ba0'), so that it doesn't overwrite the later
assignments. The code I want to replace is:
26ba0: c6 83 8a 00 00 00 00 mov BYTE PTR [rbx+0x8a],0x0
26ba7: 48 c7 83 a0 00 00 00 mov QWORD PTR [rbx+0xa0],0x0
26bae: 00 00 00 00
Unfortunately, these instructions together take up 18 bytes in total. Since our
prototype code has an extra byte, we would actually be overflowing. This cannot
happen!
There are two widely used techniques to optimize the size of a binary:
1. Using smaller opcodes. For example, the instruction 'mov ecx, 0xc0 / 8' is
represented as 'B918000000' in binary and occupies 5 bytes. Paradoxically, by
splitting it into two instructions, we can make it more compact: ''xor ecx,ecx;
mov cl,0xc0 / 8', which results in the binary code '31c9b118''. And there we
have the one byte we need.
2. Oooor we can exploit the code to our advantage by identifying and removing
any unnecessary instructions that are not used within the function. For
instance, 'rdi' is a callee-saved register that is only used once at the
beginning, where its value is stored into 'rbx', which serves as the 'this'
pointer. In this case, we can eliminate at least these three instructions:
push rdi ; no need to save it, it's not used
pop rdi ; and no need to restore it either
mov rdi, rbx ; moreover, RDI has the same value as RBX
Optimized code would look like this:
--------------------------[ patch_protoype_02.nasm ]---------------------------
BITS 64
push rcx ; Save all registers we are working with
push rax
mov ecx, 0xc0 / 8 ; Allocated memory divided by QWORD length
xor eax, eax ; Zero out RAX as we'll use it as the value
rep stosq ; Rewrite memory at 'rdi+rcx' with the value in 'rax'.
pop rax ; Restore all registers we where working with
pop rcx
-------------------------------------------------------------------------------
Now it has only 14 bytes:
$ nasm -O 0 -f bin patch_protoype_02.nasm -o patch_protoype_02
$ wc -c < patch_protoype_02
14
This is cool, but now it's way too small (or still not small enough), so we
have to pad the rest. There are '18 - 14 = 4' bytes to pad. We could use four
'nop' instructions and be done with it, buuuut there are multi-byte NOP
instructions [ref16] that can also be used. Unfortunately, NASM doesn't
properly support them, therefore we need to create a correct sequence of bytes:
db 0x0F, 0x1F, 0x40, 0x00
(I won't deny it, this code does look much 1337er!)
I would likely attempt to further optimize it or simply go with the first
technique and create an 18-byte-sized binary code, but playtime's over! Let's
patch!
It's time to roll up our sleeves and get started. We don't need any fancy tools
to get the job done. Patching can be accomplished with just "good" old 'dd'
[ref17].
We'll need to be precise here. In addition to rewriting bytes at the correct
offset and rewriting only 18 bytes, we must also ensure that the file isn't
truncated. 'dd' is a very useful tool, but it's definitely not the most
user-friendly. (No wonder, it's from 1974 [ref18].)
Rewriting bytes on the exact position using 'dd' can be tricky. The
old-school approach is to set the output block size ('obs') to 1, which seeks
to the correct position but writes input one byte at a time (i.e., for every
byte, the 'write(2)' syscall is called). While this approach is inefficient,
it's negligible for small inputs. Newer versions of 'dd' offer the "byte"
suffix ('B') for seek/skip options, allowing us to write commands like:
'dd if=i of=o skip=7B' without sacrificing block size.
Let's take 'patch_protoype_01.nasm' and craft the final code that we will
inject:
------------------------------[ inject_me.nasm ]-------------------------------
BITS 64
push rcx ; Save all registers we are working with
push rax
mov ecx, 0xc0 / 8 ; Allocated memory divided by the QWORD length
xor eax, eax ; Zero out RAX, we'll use it as the zero value
rep stosq ; Rewrite memory at RDI+RCX with the value in RAX
pop rax ; Restore all registers we where working with
pop rcx
db 0x0F, 0x1F, 0x40, 0x00 ; 4-byte NOP
-------------------------------------------------------------------------------
Build it:
$ nasm -O 0 -f bin inject_me.nasm -o inject_me
$ xxd -ps inject_me
5150b9c000000031c0f348ab58590f1f4000
$ wc -c < inject_me
18
And finally, let's inject the binary patch into the library at the offset
'0x26ba0':
$ cp -a /usr/lib/zlibrary/ui/zlui-gtk.so .
$ dd if=inject_me of=zlui-gtk.so seek=$((0x26ba0)) obs=1 conv=notrunc
NOTE: The '$((...))' construct is shell math evaluation. which allows for the
quick conversion of hexadecimal numbers to decimal values and computation of
offsets on-the-fly, eliminating the need to evaluating it beforehand.
After patching, we should always verify that the resulting binary looks
correct:
$ objdump -M intel --disassemble='ZLGtkPaintContext::ZLGtkPaintContext()' \
--demangle ./zlui-gtk.so
...
26b99: 48 8b 05 68 03 01 00 mov rax,QWORD PTR [rip+0x10368]
26ba0: 51 push rcx
26ba1: 50 push rax
26ba2: b9 c0 00 00 00 mov ecx,0xc0
26ba7: 31 c0 xor eax,eax
26ba9: f3 48 ab rep stos QWORD PTR es:[rdi],rax
26bac: 58 pop rax
26bad: 59 pop rcx
26bae: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
26bb2: 48 83 c0 10 add rax,0x10
...
Oh yeah! Everything looks good. Before we run our test, we need to replace the
original '/usr/lib/zlibrary/ui/zlui-gtk.so' with the modified one. Setting
'LD_LIBRARY_PATH' has no effect because the path to the plugins is hardcoded
into the binary. (Try searching for the '/usr/lib' pattern within the
'/usr/lib/libzlcore.so' library, and then use objdump to find the resulting
offset. You might see something like 'movabs rax,0x62696c2f7273752f', which
represents the '/usr/lib' string in big-endian hexadecimal format, followed
by the loading of the '/zlibrary/ui' string from the '.data' section.)
# mv /usr/lib/zlibrary/ui/zlui-gtk.so /usr/lib/zlibrary/ui/zlui-gtk.so.OLD
# cp -a zlui-gtk.so /usr/lib/zlibrary/ui/zlui-gtk.so
$ fbreader bpf_performance_tools.epub
Yay! It works as it should! :).
Reverse engineering, hacking, and modding are incredibly fun! It gives you the
power to manipulate any code you run. If it seems intimidating at first, don't
worry -- there's no need to be afraid. As you work through the problem, any
gaps in your knowledge will naturally fill in as you acquire new information.
The process is similar to solving a jigsaw puzzle: daunting at first, but with
each piece falling into place, the big picture emerges.
When starting, make it easy on yourself and choose a project with available
source code. This way, when assembly, binary, or memory contents don't make
sense, you can always refer back to the original code and gain insight into its
goals and how the compiler translates it.
For example, in this case study, it wasn't necessary to have the source code,
nor was it important to know that the 'myAnalysis' structure actually comes
from the Pango library [ref19], since the patch was very straightforward.
However, having a reference point is still helpful, as some compiler
optimizations will inevitably obscure the binary.
One final thing: objdump is excellent for quick RE. I typically start radare2
or Ghidra for more complex tasks, as they can be overkill for simple cases
(esp. Ghidra). However, most of the time, it's sufficient and way faster to use
objdump (in conjunction with grep). Btw, since 2020-01-13, objdump has the
'--visualize-jumps' parameter that generates ASCII art diagrams showing the
destinations of flow control instructions [ref20]:
$ objdump --visualize-jumps /lib/x86_64-linux-gnu/libc.so.6 | less
...
3583b: 48 85 ff test %rdi,%rdi
3583e: /-- 74 05 je 35845 <bindtextdomain@@GLIBC_2.2.5+0x15>
35840: | 80 3f 00 cmpb $0x0,(%rdi)
35843: /--|-- 75 0b jne 35850 <bindtextdomain@@GLIBC_2.2.5+0x20>
35845: | \-> 48 83 c4 18 add $0x18,%rsp
35849: | c3 ret
3584a: | 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
35850: \----> 48 8d 74 24 08 lea 0x8(%rsp),%rsi
35855: 31 d2 xor %edx,%edx
...
Unfortunately, it doesn't work well without symbols. radare2 and Ghidra are
still better for that job.
[ref1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=965379
* Bug#965379: FBreader: Sometimes draws hyphens after each word
[ref2] https://lists.debian.org/debian-qa-packages/2022/02/msg00074.html
* FBreader source code patch.
* Siarhei Abmiotka
[ref3] https://refspecs.linuxbase.org/elf/elf.pdf
[ref4] http://www.sco.com/developers/devspecs/gabi41.pdf
[ref5] https://man7.org/linux/man-pages/man5/elf.5.html
[ref6] https://sourceware.org/binutils/docs/binutils.html#objdump
[ref7] https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling
[ref8] Annotated C++ Reference Manual (1990): 7.2.1c
* Margaret A. Ellis, Bjarne Stroustrup
* https://en.wikipedia.org/wiki/Name_mangling#Standardized_name_mangling_in_C++
[ref9] https://en.wikipedia.org/wiki/Name_mangling#How_different_compilers_mangle_the_same_functions
[ref10] https://www.man7.org/linux/man-pages/man3/dlopen.3.html
[ref11] https://security-tracker.debian.org/tracker/CVE-2008-0166
[ref12] https://lists.debian.org/debian-security-announce/2008/msg00152.html
* [SECURITY] [DSA 1571-1] New openssl packages fix predictable random number generator
[ref13] https://en.wikipedia.org/wiki/Trampoline_(computing)
[ref14] https://www.felixcloutier.com/x86/stos:stosb:stosw:stosd:stosq
[ref15] https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz
[ref16] https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
* 3.5.1.9 Using NOPs
* April 2024 ; Document Number: 248966-050US
[ref17] https://www.man7.org/linux/man-pages/man1/dd.1.html
[ref18] https://en.wikipedia.org/wiki/Dd_(Unix)
[ref19] https://docs.gtk.org/Pango/struct.Analysis.html
[ref20] https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=1d67fe3b6e696fccb902d9919b9e58b7299a3205;hp=a4f2b7c5d931f2aa27851b59ae5817a6ee43cfcb