!!! UNFINISHED XXX !!!
-----------------------------------[ code ]------------------------------------
.---------. What do you want to solve?
.----->| PROBLEM | (You have some question you want to answer.)
| '---------'
| |
| v
| .------------. Try to find out if someone already solved this problem.
| | (RE)SEARCH | (This can be skipped for a learning experience :))
| '------------'
| |
| +-------> This can generate new questions,
| | write them down for later tests.
| v
| .------------. Hypothesis is a fancy word for:
+---->| HYPOTHESES | "I think it could work like this..."
| '------------' "This happens if I do that ..."
| |
| v
| .-------------.
| | EXPERIMENTS | Most fun lies here!
| | & | Here you tackle the problem(s), you implement your
| | ANALYSIS | hypothesis, executing it, and observing its outcome.
| '-------------'
| something |
| is wrong |
'-----------+ If hypothesis or question is wrong, you need to reformulate it.
|
v
.----------. When you arrive here, you should have the answer you need.
| SOLUTION | (You should have at least one solution for your problem.)
'----------'
-------------------------------------------------------------------------------
In a nutshell: Scientific method is set of guidelines that allows us to solve
any problem.
The term "scientific method" sounds mysterious and complicated, but it is
actually pretty natural thing for us to do. When we are solving something we
want to know *the* cause of it. Scientific method gives us a recipe how we can
tackle unknown problems. (Science is all about conquering unknowns and it needs
good tools to do it and one of those tools is scientific method.)
Scientific method is pretty flexible and we can use only part of it. There are
some basic principles we should always have in mind:
1. At first, we need to know what we ... want to know :). It typically takes
form of a question. E.g "How is binary executed in Linux?", "Why the hell is it
not working?!".
2. We start searching for an answer. This is done by using a search engine
(e.g. google), reading though documentations, looking inside source codes,
tracing, asking on forums or unbelievably even someone in person, ...
3. After (re)search phase we may have a good idea what we are dealing with and
how to solve it. This is called "hypothesis".
4. Most often we also want to experimentally prove that those solutions that we
have found are correct.
5. Then we take the results from our experiments, look at them, and decide if
it solves our problem. If not, we can return back to research, experimentation
or we have to even formulate a new question.
These steps are so natural, that we are typically not even aware of them. What
scientific method adds is stress on systematic approach. We should be the ones
that controls an experiment, not the other way around.
Working systematically means working effectively. We want to get reliable
results fast. The method helps us to achieve it. I highly recommend you to
learn it properly.
Thinking hard about a problem and figuring out a good question is crucial if we
want some effectiveness. You have probably heard the saying: "there is no such
thing as a stupid question", ... well no, but there can be a poorly chosen
question for a specific problem.
Imagine that we have
XXX
1. When experimenting we should always be aware of all variables in a
system[1]. Ideally we want the only one variable and everything else to be
constant, then we know that this specific change will cause this type of system
behavior.
If we have no such luxury and we have to deal with multiple variables, we can
try to use bisection, i.e dividing a system until we have smaller areas with
less variables and more reliable results.
2. Understanding inputs and outputs. XXX
3. Before we start experimenting, we have to have relevant environment. Reproduce behavior. XXX
3. Documenting our progress. For me personally, this is crucial, because after
a few hours or days of debugging I won't remember what I have already tried.
Also I frequently revisit older issues I have already solved, because after a
while some problems starts connecting with each other.
I highly recommend you to write notes. It does not have to be polished, it
could be just a few words or copy-paste of an input and output, but it have to
have a form that your future self is able to read.
[1] https://en.wikipedia.org/wiki/System
Suppose we want to know how is execution of binaries done in Linux.
1. [Question] "How does Linux execute a binary?"
2. [Research] A quick google search for "How does Linux execute a binary?"
gives us some garbage on how to set execution bit on a binary/script by
'chmod'. This is not what we want.
We have to try to reformulate the question: "linux kernel binary execution".
That gives us way better results. One of them is great "linux-insides":
https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-4.html
3. [Hypothesis] Now we may have some idea that it is done by syscall 'execve'
and that kernel opens the file, reads the magic header, check if the type of
the binary is known and if so, it loads the binary into a memory, setup
registers and jumps back to user space.
4. [Question] How can we see it? Searching for "linux tracing" gets us:
https://www.man7.org/linux/man-pages/man1/strace.1.html
strace -- syscall tracer
5. [Experimentation] Now we can create C/ASM binary, that calls 'execve', but
it is so ubiquitous that we can just call anything. Using 'strace' is super
easy:
------------------------------[ strace example ]-------------------------------
$ strace -vv /bin/true
execve("/bin/true", ["/bin/true"], ["BRM=brk", "ENV2=2"]) = 0
...
-------------------------------------------------------------------------------
'strace' is great, but it only shows us how does user space communicate with
kernel.
6. [Question] Can we see how does kernel calls look like? By searching for "linux
kernel tracing" we get:
https://www.kernel.org/doc/html/latest/trace/ftrace.html
ftrace -- kernel function tracer
'ftrace' is bit more complicated to setup, but lets say we have
everything in place and we can just use tracefs ftrace[4].
7. [Controlling variables] If we trace whole system, we would not know which
execution is ours, because there will be many background processes executing at
the same time. For example, we can trace a PID of shell and inside ithat shell
we will call 'exec', like this:
--------------------------------[ shell exec ]---------------------------------
sh
echo $$ # This PID we need to write into ftrace filter.
exec /bin/true # After ftrace setup, we can run 'execve'.
-------------------------------------------------------------------------------
Although the output we get is big, exec function is near the beginning.
8. [Filtering noise] Lets say, we do not want so much unrelated calls, because
it is hard to navigate. We can control it even further by creating specialized
binaries, that do the least minimum code possible.
For example, we can create an assembly code that just exits:
------------------------------[ exit_only.nasm ]-------------------------------
BITS 64
GLOBAL _start
_start:
mov rax, 0x3c ; sys_exit (code)
xor rdi, rdi ; code=0
syscall
-------------------------------------------------------------------------------
Now, we can run it instead of '/bin/true' and when we trace it, we should get
only the relevant data of '__x64_sys_execve' and '__x64_sys_exit' function
calls and what they are calling (and garbage like interrupts).
[4] XXX
XXX
https://en.wikipedia.org/wiki/Scientific_method
https://www.youtube.com/watch?v=nsnyl8llfH4 (Mark Rober -- 1st place Egg Drop
> project ideas- using SCIENCE) XXX