Quick and dirty way to create your own assembler in nasm

[TXT]     [HOME]     [TOOLS]     [GAMES]     [RSS]        [ABOUT ME]    [GITHUB]

.-----------------------------------------------------------------------------.
|          Quick and dirty way to create your own assembler in nasm           |
'-----------------------------------------------------------------------------'
updated: 2022-11-08


===[ How to create custom assembler ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's say that we are reverse engineering some custom virtual machine and we
want to inject our own code. How to do it without creating full-fledged custom
assembler? One way to do it is to use some functionality of 'nasm' [ref1]. It
provides macros, labels, jumps, variables, basic arithmetic and other useful
constructs.

When we want to define our own instruction set, we can use macros and
constants. Macros here behaves like C macros: name of a macro is replaced by
a value of that macro. Following code defines custom 'cmp' instruction:

----------------------------------[ cmp.asm ]----------------------------------
%define cmp(a1, a2) db 0x01, a1, a2     ; macro
r1 equ 0x02                             ; constant
cmp (r1, 0x03)                          ; usage
-------------------------------------------------------------------------------

If we compile it, hexdump should produce '010203':

$ nasm -f bin ./cmp.asm -o ./cmp

$ xxd ./cmp
00000000: 0102 03                                   ...

This alone is very useful, but if we combine it with labels and math, we get
very potent tool. For example we can define opcodes for basic instructions like
'mov', 'cmp', 'add' and 'jmp'. We can also easily compute an address
for the jump instruction as relative to the start of the section ('$$'):

----------------------------[ simple_ins_set.asm ]-----------------------------
; Opcode: INS ARG1 ARG2
%define add(a1, a2)  db 0x01, a1, a2
%define mov(a1, a2)  db 0x02, a1, a2
%define cmp(a1, a2)  db 0x04, a1, a2
%define jmp(cond, addr) db 0x20, cond, (addr - $$)  ; compute relative address

r1 equ 0x02           ; register 1

EQ equ 0x01           ; equal
LT equ 0x02           ; lower than

; CODE
mov (r1, 0x00)

loop1:
  add (r1, 1)
  cmp (r1, 10)
  jmp (LT, loop1)
-------------------------------------------------------------------------------

If we analyze the binary output, we get:

-----------------------------[ annotated hexdump ]-----------------------------
0000:   020200    ; mov (r1, 0x00)
0003:   010201    ; add (r1, 1)
0006:   04020a    ; cmp (r1, 10)
0009:   200203    ; jmp (LT, loop1), where loop1 = 3
-------------------------------------------------------------------------------

That is awesome! Now, let's say that we have VM with jumps which operates on a
position of an instruction, not on an address! E.g. if we have 5 instructions
which are 3 bytes wide and we want to jump on third instruction (and if it is
indexed from 1), we have to give it argument with value 3. (On the other hand
if the jump would operate on an address, we would have to give it argument with
value 6 -- like in the code above.)

---------------------------[ computing an address ]----------------------------
; Instruction length is fixed. We will be dividing an address with this.
INSTR_LEN equ 3

; This VM only allows jumps on specific "line". It is mitigation against
; jumping in the middle of an instruction.
; Equation: address / instruction_length
%define jmp(a1, a2) db 0x04, a1, ((a2 - $$) / INSTR_LEN)
;                                       ^ start of the section (which is 0)
; ...

;; CODE                 INSTRUCTION POINTER (= "ADDRESS")
mov (b, 1)              ; IP = 0
cmp (d, b)              ; IP = 1
jmp (GT, if_d_GT_1)     ; IP = 2   if True: set IP to 6
jmp (LE, if_d_GT_0)     ; IP = 3   if True: set IP to 4

if_d_GT_0:
  mov (c, 77)           ; IP = 4
  add (d, b)            ; IP = 5

if_d_GT_1:
  mov (b, 1)            ; IP = 6
-------------------------------------------------------------------------------

This technique is sufficient if opcodes have fixed size, but if we need to
operate with variable size of instructions and operands, we have to use
multi-line macros.




===[ Multi-line macros ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Multi-line macros can be useful when we need different size of parts of an
instruction. (They can also overload an instruction.)

Example:

%macro push 1
    db 0xff       ; one byte
    dw %1         ; two bytes
%endmacro

push 0x1111

Output will be: 'ff 11 11'. It does not even need parentheses, BUT it is sill
very good idea to have them there, so we can see that we are using macro and
not some instruction which will be interpreted as a x86 instruction.

'nasm' preprocessor gets very powerful when we start to use conditions,
because we can create a macro with output based on type of arguments.

%macro mov 2        ; My 'mov' takes two arguments (operands)
    db 0xff         ; First part of 'mov' opcode is fixed

    ; Based on type of argument of macro (check if second argument is a NUMBER)
    %ifnum %2
        db 0xaa     ; Second part of 'mov' opcode is a number
        dw %2
    %else
        db 0xbb     ; Second part of 'mov' opcode is a register
        db %2
    %endif
%endmacro

r1 equ 0x00         ; My register

mov (r1, 0x1111)    ; OUTPUT:  ffaa 1111
mov (r1, r1)        ; OUTPUT:  ffbb 00

Finally, after we define all macros we need/want, we can save them into
separate file and then we can include the file as we please.

%include "macros.nasm"




===[ Limitations ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It is important to say that macros are not instructions! Assembler does not
understand them as it does native instructions. Therefore there is absence of
semantic errors and so on. We have to do every reasoning about what we write.

What can be a big problem is that the native disassembler will not work,
because it does not understand our instruction set. If we are working on some
bytecode, we need need this feature. Actually disassembler would be the first
thing we would write when we are doing reverse engineering. Simple disassembler
is actually not that hard to write and it can be quickly achieved e.g. by
python.

Notes about labels: addresses of labels are computed absolutely. Setting
'DEFAULT REL' does not help us, because we are not using native instructions!
We have to do correct arithmetic on labels by our self in order to jump on
correct address in a VM! We can achieve this by:

  - setting the correct 'org' (if a VM is operating on fixed addresses), or

  - computing relative offset from some base label (for example from start of
    the section: '$$').

Example:

org 0x20                    ; start address = 32

label1:
    times 8 db 0            ; fill some space (8 bytes)

label2:
    db label1               ; 0x20 = 32    <-- absolute
    db label2               ; 0x28 = 40    <-- absolute
    db label2 - label1      ; 0x08 = 8     <-- relative




===[ Summary ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

'nasm' preprocessor is very useful tool to have in our arsenal. By that, we
can quickly prototyping custom instruction set.

It is good to know that 'nasm' is far from being the only powerful assembler.
For example 'fasm' has similar feature set (and there are probably others
like that). You should look at it, maybe you will like its syntax more.




===[ References ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[ref1] https://www.nasm.us/doc/nasmdoc4.html#section-4.1.1 (Chapter 4: The NASM Preprocessor)