Here's the scenario: We want to craft two different messages with the same MD5 hash, and a specific CRC32 checksum, simultaneously.
In other words, we want an MD5 collision attack and a CRC32 preimage attack.
This might seem like a contrived scenario, but it's exactly the one I faced while producing my PNG hashquine (Yes OK maybe that's also a contrived scenario, cut me some slack).
On its own, a CRC32 preimage attack is trivial. You can craft a 4-byte suffix that gives any message a specific checksum, calculated using a closed-form expression (which I am too lazy to derive, not even with assistance from Z3). It's not an attack per-se, since CRC32 was never meant to be cryptograpically secure in the first place.
Similarly, an MD5 collision can be calculated near-instantly (about 1 second) using FastColl - which uses some very clever maths, but as an end-user it's point-and-shoot.
So, FastColl can give us two different messages with the same MD5 hash, but different CRCs. We can add a 4-byte suffix to each message, to correct the CRCs, right?
Wrong. Since each message started off with a different CRC, there is no single suffix that could be added to both, to give them both the same final CRC (The proof of this is left as an exercise to the reader!). If you used a different suffix for each message, you could make the CRCs match, but the MD5s would be different.
It seems like we can't have our cake and eat it.
To make things more concrete, here's an example message pair I made using UniColl (same idea as FastColl, but gives more control over the data, letting me write this funny message):
$ md5sum collision*
6aa7959465eff8eeade6d4ce15fdf1ee collision1.bin
6aa7959465eff8eeade6d4ce15fdf1ee collision2.bin
$ crc32sum collision*
505d9be4 collision1.bin
18dc7eba collision2.bin
$ head -n1 collision1.bin
Hello, it's me, Mr President. If this box is ticked, launch the nukes: ✔
$ head -n1 collision2.bin
Hello, it's me, Mr President. If this box is ticked, launch the nukes: ✕
We have two different files, both with the same MD5, but different CRCs. We want both CRCs to be the same, and we want them to be an arbitrary value - I'm picking 0xdeadbeef
.
Here's the plan. First, we're going to use FastColl to add even more (36) colliding block pairs to the messages. This will change their MD5s, but they'll still be the same as each other.
import os
prefix = bytearray(open("collision1.bin", "rb").read())
f2_a = b""
f2_b = b""
for _ in range(36):
with open("prefix_mitm.bin", "wb") as pf:
pf.write(prefix)
os.system("rm -f prefix_mitm_msg*.bin && ./fastcoll -p prefix_mitm.bin")
coll_a = open("prefix_mitm_msg1.bin", "rb").read()[-128:]
coll_b = open("prefix_mitm_msg2.bin", "rb").read()[-128:]
f2_a += coll_a
f2_b += coll_b
prefix += coll_a
with open("f2a.bin", "wb") as f:
f.write(f2_a)
with open("f2b.bin", "wb") as f:
f.write(f2_b)
Each collision we generate gives us a choice of two different blocks to pick from, at that position in the message.
Swapping the blocks will change the CRC32, but keep the MD5 the same. As a quick demonstration, swapping out all the blocks at once changes the CRCs like so (but not the MD5s):
$ cat collision1.bin f2a.bin | md5sum
d5ef003912d6877a6686d16a45054ce5 -
$ cat collision1.bin f2b.bin | md5sum
d5ef003912d6877a6686d16a45054ce5 -
$ cat collision2.bin f2a.bin | md5sum
d5ef003912d6877a6686d16a45054ce5 -
$ cat collision2.bin f2b.bin | md5sum
d5ef003912d6877a6686d16a45054ce5 -
$ cat collision1.bin f2a.bin | crc32sum
17f0391b
$ cat collision1.bin f2b.bin | crc32sum
adf1eb5b
$ cat collision2.bin f2a.bin | crc32sum
6080b475
$ cat collision2.bin f2b.bin | crc32sum
da816635
Since there are 36 pairs of blocks, there are 2^36 possible combinations, for each message. If we enumerate them all and check the resulting CRC, we'd have a reasonable chance of stumbling across the target CRC (0xdeadbeef
) by chance - since there are only 2^32 possible CRC32 results.
This is a feasible computation, but it would be a bit slow, probably on the order of minutes - infeasible as part of the inner-loop of a larger attack (such as a PNG hashquine!). Fortunately, there's a trick we can use to speed this up.
CRC32 is an invertible function. This means we can run it backwards as well as forwards, to "rewind" the state to an earlier point in time, so long as we know the data it was hashing.
To illustrate, here's a quick python implementation and demo:
def create_table():
a = []
for i in range(256):
k = i
for j in range(8):
if k & 1:
k ^= 0x1db710640
k >>= 1
a.append(k)
return a
crc_table = create_table()
inv_table = [None]*256
for i, x in enumerate(crc_table):
inv_table[x>>24] = (x << 8) ^ i
def crc32(buf, crc=0):
crc ^= 0xffffffff
for k in buf:
crc = (crc >> 8) ^ crc_table[(crc & 0xff) ^ k]
return crc ^ 0xffffffff
def inverse_crc32(buf, crc):
crc ^= 0xffffffff
for k in buf[::-1]:
crc = ((crc << 8) ^ inv_table[crc >> 24] ^ k) & 0xffffffff
return crc ^ 0xffffffff
# this prints 0x3610a686
print("crc32(hello) =", hex(crc32(b"hello")))
# this prints 0x0
print("inverse_crc32(hello, crc32(hello)) = ", hex(inverse_crc32(b"hello", 0x3610a686)))
Rather than iterate through all 2^36 combinations of our colliding blocks, we'll do this:
Take the first 18 pairs of collision blocks, and iterate through all 2^18 possible combinations. For each combination, calculate the CRC and note down the result and the combination that produced it, using a hashtable (a dict, if you speak python).
This gives us a table of approximately 2^18 CRC-to-input-block-combination mappings, which we can query efficiently.
Next, we take the last 18 pairs of collision blocks, and iterate through all 2^18 possible combinations of those - but this time running CRC32 in reverse, starting from the target value of 0xdeadbeef
.
For each result we get, we look it up in our hashtable - if it's present, than we've just successfully "met in the middle", and found a CRC32 preimage! This process must be repeated for each of our messages (twice).
Because all our blocks were MD5 collisions, the final MD5s are unchanged, and we have our target CRC! And, the overall attack complexity is only 2^19, which is very feasible.
Without further ado, here's the full implementation:
prefix1 = open("collision1.bin", "rb").read()
prefix2 = open("collision2.bin", "rb").read()
blocks = []
with open("f2a.bin", "rb") as fa:
with open("f2b.bin", "rb") as fb:
for _ in range(36):
blocks.append([
fa.read(128),
fb.read(128)
])
forward_blocks = blocks[:18]
backward_blocks = blocks[18:][::-1]
def search_forwards(table, sum, history=[], idx=0):
if idx >= len(forward_blocks):
table[sum] = history
return
for i, child in enumerate(forward_blocks[idx]):
search_forwards(table, crc32(child, sum), history + [i], idx+1)
def search_backwards(table, sum, history=[], idx=0):
if idx >= len(backward_blocks):
if sum in table:
return table[sum] + history[::-1]
return
for i, child in enumerate(backward_blocks[idx]):
res = search_backwards(table, inverse_crc32(child, sum), history + [i], idx+1)
if res is not None:
return res
def do_preimage(prefix, filename):
table = {}
search_forwards(table, crc32(prefix))
path = search_backwards(table, 0xdeadbeef)
reassembled = prefix + b"".join([b[i] for i, b in zip(path, blocks)])
with open(filename, "wb") as of:
of.write(reassembled)
do_preimage(prefix1, "out1.bin")
do_preimage(prefix2, "out2.bin")
When we run it, this gives us the final result:
$ md5sum out?.bin
f54e7363d9c72ebf81605e2b38fa2be5 out1.bin
f54e7363d9c72ebf81605e2b38fa2be5 out2.bin
$ crc32sum out?.bin
deadbeef out1.bin
deadbeef out2.bin
$ head -n1 out1.bin
Hello, it's me, Mr President. If this box is ticked, launch the nukes: ✔
$ head -n1 out2.bin
Hello, it's me, Mr President. If this box is ticked, launch the nukes: ✕
Note that I'm using head
to isolate the first line of the files, hiding all the garbage collision data. You could also use ANSI control squences embedded in the data to hide it - this is left as an exercise to the reader.
The output files are attached to this gist, base64'd, if you want to play with them.
http://geohot.com/lhcwriteup.html