===[ How to create custom assembler ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's say that we are reverse engineering some custom virtual machine and we want to inject our own code. How to do it without creating full-fledged custom assembler? One way to do it is to use some functionality of 'nasm' [ref1]. It provides macros, labels, jumps, variables, basic arithmetic and other useful constructs. When we want to define our own instruction set, we can use macros and constants. Macros here behaves like C macros: name of a macro is replaced by a value of that macro. Following code defines custom 'cmp' instruction: ----------------------------------[ cmp.asm ]---------------------------------- %define cmp(a1, a2) db 0x01, a1, a2 ; macro r1 equ 0x02 ; constant cmp (r1, 0x03) ; usage ------------------------------------------------------------------------------- If we compile it, hexdump should produce '010203': -----------------------------------[ code ]------------------------------------ $ nasm -f bin ./cmp.asm -o ./cmp $ xxd ./cmp 00000000: 0102 03 ... ------------------------------------------------------------------------------- This alone is very useful, but if we combine it with labels and math, we get very potent tool. For example we can define opcodes for basic instructions like 'mov', 'cmp', 'add' and 'jmp'. We can also easily compute an address for the jump instruction as relative to the start of the section ('$$'): ----------------------------[ simple_ins_set.asm ]----------------------------- ; Opcode: INS ARG1 ARG2 %define add(a1, a2) db 0x01, a1, a2 %define mov(a1, a2) db 0x02, a1, a2 %define cmp(a1, a2) db 0x04, a1, a2 %define jmp(cond, addr) db 0x20, cond, (addr - $$) ; compute relative address r1 equ 0x02 ; register 1 EQ equ 0x01 ; equal LT equ 0x02 ; lower than ; CODE mov (r1, 0x00) loop1: add (r1, 1) cmp (r1, 10) jmp (LT, loop1) ------------------------------------------------------------------------------- If we analyze the binary output, we get: -----------------------------[ annotated hexdump ]----------------------------- 0000: 020200 ; mov (r1, 0x00) 0003: 010201 ; add (r1, 1) 0006: 04020a ; cmp (r1, 10) 0009: 200203 ; jmp (LT, loop1), where loop1 = 3 ------------------------------------------------------------------------------- That is awesome! Now, let's say that we have VM with jumps which operates on a position of an instruction, not on an address! E.g. if we have 5 instructions which are 3 bytes wide and we want to jump on third instruction (and if it is indexed from 1), we have to give it argument with value 3. (On the other hand if the jump would operate on an address, we would have to give it argument with value 6 -- like in the code above.) ---------------------------[ computing an address ]---------------------------- ; Instruction length is fixed. We will be dividing an address with this. INSTR_LEN equ 3 ; This VM only allows jumps on specific "line". It is mitigation against ; jumping in the middle of an instruction. ; Equation: address / instruction_length %define jmp(a1, a2) db 0x04, a1, ((a2 - $$) / INSTR_LEN) ; ^ start of the section (which is 0) ; ... ;; CODE INSTRUCTION POINTER (= "ADDRESS") mov (b, 1) ; IP = 0 cmp (d, b) ; IP = 1 jmp (GT, if_d_GT_1) ; IP = 2 if True: set IP to 6 jmp (LE, if_d_GT_0) ; IP = 3 if True: set IP to 4 if_d_GT_0: mov (c, 77) ; IP = 4 add (d, b) ; IP = 5 if_d_GT_1: mov (b, 1) ; IP = 6 ------------------------------------------------------------------------------- This technique is sufficient if opcodes have fixed size, but if we need to operate with variable size of instructions and operands, we have to use multi-line macros.

===[ Multi-line macros ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Multi-line macros can be useful when we need different size of parts of an instruction. (They can also overload an instruction.) Example: -----------------------------------[ code ]------------------------------------ %macro push 1 db 0xff ; one byte dw %1 ; two bytes %endmacro push 0x1111 ------------------------------------------------------------------------------- Output will be: 'ff 11 11'. It does not even need parentheses, BUT it is sill very good idea to have them there, so we can see that we are using macro and not some instruction which will be interpreted as a x86 instruction. 'nasm' preprocessor gets very powerful when we start to use conditions, because we can create a macro with output based on type of arguments. -----------------------------------[ code ]------------------------------------ %macro mov 2 ; My 'mov' takes two arguments (operands) db 0xff ; First part of 'mov' opcode is fixed ; Based on type of argument of macro (check if second argument is a NUMBER) %ifnum %2 db 0xaa ; Second part of 'mov' opcode is a number dw %2 %else db 0xbb ; Second part of 'mov' opcode is a register db %2 %endif %endmacro r1 equ 0x00 ; My register mov (r1, 0x1111) ; OUTPUT: ffaa 1111 mov (r1, r1) ; OUTPUT: ffbb 00 ------------------------------------------------------------------------------- Finally, after we define all macros we need/want, we can save them into separate file and then we can include the file as we please. -----------------------------------[ code ]------------------------------------ %include "macros.nasm" -------------------------------------------------------------------------------

===[ Limitations ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It is important to say that macros are not instructions! Assembler does not understand them as it does native instructions. Therefore there is absence of semantic errors and so on. We have to do every reasoning about what we write. What can be a big problem is that the native disassembler will not work, because it does not understand our instruction set. If we are working on some bytecode, we need need this feature. Actually disassembler would be the first thing we would write when we are doing reverse engineering. Simple disassembler is actually not that hard to write and it can be quickly achieved e.g. by python [ref3]. Notes about labels: addresses of labels are computed absolutely. Setting 'DEFAULT REL' does not help us, because we are not using native instructions! We have to do correct arithmetic on labels by our self in order to jump on correct address in a VM! We can achieve this by: - setting the correct 'org' (if a VM is operating on fixed addresses), or - computing relative offset from some base label (for example from start of the section: '$$'). Example: -----------------------------------[ code ]------------------------------------ org 0x20 ; start address = 32 label1: times 8 db 0 ; fill some space (8 bytes) label2: db label1 ; 0x20 = 32 <-- absolute db label2 ; 0x28 = 40 <-- absolute db label2 - label1 ; 0x08 = 8 <-- relative -------------------------------------------------------------------------------

===[ Summary ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

'nasm' preprocessor is very useful tool to have in our arsenal. By that, we can quickly prototyping custom instruction set. It is good to know that 'nasm' is far from being the only powerful assembler. For example 'fasm' has similar feature set (and there are probably others like that). You should look at it, maybe you will like its syntax more.

===[ References ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[ref1] https://www.nasm.us/doc/nasmdoc4.html#section-4.1.1 (Chapter 4: The NASM Preprocessor) > [ref2] XXX https://github.com/fandauchytil/ > [ref3] XXX quick and dirty disassembler in python