===[ XOR crash course ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(In most programming languages operator for XOR is represented by caret '^' and that symbol will be used in article below.) XOR (eXclusive OR) is boolean operation that takes 2 bits as input and outputs one bit. Operation is simple: if we XOR the same bit values together we get 0 and if we XOR the opposite bit values we get 1. Truth table: a ^ b = c --------- 0 0 0 0 1 1 1 0 1 1 1 0 There are two important things from the table above: 1. Number XORed by zero is always the same number: 'a ^ 0 = a' 2. Number XORed by the same number is always zero: 'a ^ a = 0' If we have equation 'a ^ b = c' we can play with it like that: a ^ c = a ^ (a ^ b) = a ^ a ^ b = 0 ^ b = b b ^ c = b ^ (a ^ b) = b ^ b ^ a = 0 ^ a = a Why would we want to do that? Well, let's say we have some plain text message 'msg', that we want encrypt by the key 'key', but message is longer than the key, what now? We will solve it by galaxy brain idea: let's divide the message into blocks which have the same size as the key and XOR that: ------------------------------[ block_cipher.py ]------------------------------ i = 0 while i < len (msg): cipher = msg ^ key i += len (key) ------------------------------------------------------------------------------- We are safe as long as nobody finds out our key, right? There is just one teensy little problem, if we XOR the key with same data multiple times, we get the same result: key = 0b1011 plain_text_block_01 = 0b0111 plain_text_block_02 = 0b0111 output_01 = key ^ plain_text_block_01 = 0b1100 output_02 = key ^ plain_text_block_02 = 0b1100 That means that repeating patterns in a message with combination with simple XOR encryption are ... bad. With little information about type of message, we can easily break it.

===[ Challenge example ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's find out repeating patterns. I this case file and the key are small enough that we can see the pattern just by looking at hexdump: -----------------------------------[ code ]------------------------------------ ;,73;0*?*710;&.;,73;0*?*710;&.;,73;0*?*710 ,73;0*?*710;&.;,73;0 ------------------------------------------------------------------------------- Based on how is the sequence of bytes repeated, we can assume a few things: -----------------------------------[ code ]------------------------------------ ;,73;0*?*710;&. ;,73;0*?*710;&. ;,73;0*?*710 ,73;0*?*710;&. ;,73;0 ------------------------------------------------------------------------------- 1. The key is encrypting the same bytes over and over or it is encrypting a repeating sequence that has the same length as the key. (The first one is easier to check, so we'll start with that.) 2. From that it looks like our key has length of 15 bytes. 3. We still don't know the correct order of the bytes. 4. From previous "challenges" we will suppose that the encrypted message makes some sense, i.e. it is an ASCII printable text. This knowledge can be enough for us to brute force it (if it fails we can do frequency analysis, but we can get lucky). Let's do it! At first we can easily determine the correct order of byte of the key just by counting from start of the file to the sequence. From a hex editor we know that the sequence ';,73;0*?*710;&.' starts at offset '0x4E' (= 78) [1] and we know from how it is repeating, that the key has length of 15 bytes. So we can divide 63 by 15 and get the reminder (for that we'll use modulo operator): 78 % 15 = 3 [1] We want the second occurrence, because the first one can be XORed with shifted key (that's how blocks work). The offset is '78 - 3 = 75' and we read 15 bytes from offset 75 and we get a new sequence: ;&.;,73;0*?*710 In the next step, we need to realize that all sequences in the message was created like this: plain_text_block ^ key = cipher_text_block So, of we take the first 15 bytes of the cipher text and XOR them with bytes of the sequence above, we get: cipher_text_block_0 ^ cipher_text_block_4 = = (plain_text_block_0 ^ key) ^ (plain_text_block_4 ^ key) = = plain_text_block_0 ^ plain_text_block_4 If we are correct in our assumption and 'plain_text_block_4' is composed of just one byte repeated 15 times, we can brute force it only in 256 interactions (that is the total number of bits in one byte => 8 bits), because when we use the equation above, key will get zero and the rest will be effectively XORed with the same one byte. When find that byte, we will automatically get the plain text. Here is annotated code, that gives us the solution: ---------------------------[ breaking_naive_xor.py ]--------------------------- # We know the key length. key_len = 15 # We assume that this sequence of bytes are the same bytes XORed by the key. sequence = b';&.;,73;0*?*710' # Read the binary into memory (file is small, so we can store all it into mem). with open ('xor.bin', 'rb') as f: cipher_text = f.read () # Iterate over all bytes. (By this we are searching for a byte in the # sequence.) for n in range (0, 256): # We'll write every possible "solution" to a separate file. with open (f'/dev/shm/xor.out.{n:02x}', 'wb') as f: # Write the byte to the beginning of a file, f.write (bytes (f'key = {n:c} (0x{n:02x})\n', 'u8')) # This is our XOR trick (cipher_text2 = sequence): # # cipher_text1 ^ cipher_text2 ^ n = # = (plain_text1 ^ key) ^ (plain_text2 ^ key) ^ n = # = plain_text1 ^ plain_text2 ^ n # # If 'n' is the same byte as 'plain_text2', XORing them together will # give us 0 and 0 XOR 'plain_text1', gives us 'plain_text1'. # Iterate over 'key_len' chunks and XOR corresponding bytes. for idx in range (0, len (cipher_text) - key_len, key_len): plain_text = [] for i in range (0, key_len): # XOR trick pt = cipher_text[idx+i] ^ sequence[i] ^ n plain_text.append (pt) # Write plain text to a file. f.write (bytes (plain_text)) ## For those who understand list comprehension all in this level ## of indentation can be replaces by: #f.write (bytes (cipher_text[idx+i] ^ sequence[i] ^ n for i in range (0, key_len))) ------------------------------------------------------------------------------- And that's it. Code will write 256 files into '/dev/shm/', we just need to find the correct one.

===[ Frequency analysis ]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We were (kind of) lucky by finding such nice repeating sequence of bytes encrypted by a short key. Actually it's not unusual to find e.g. firmware that uses the same type of "encryption" (especially in older update patches). Is can be even easier to get a key, because there are often lots of zeroes in a payload and what happens when you XOR message with zero? As you have probably guessed -- the key. But I digress. Let's say we are not that lucky and we have just regular English text or ZIPped files. If we know (or guess) something about an original message, we ca use statistical technique called "frequency analysis". I will not go into more details (look it up), but this generalize the method we had used in this article. Only we are not looking for a single byte sequence, but for the most common sequence of bytes (e.g. one of to most often bigram in a English text is 'th').