You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description -
A company I interview for gave me a test to solve. I'd decompile it and get the answer that way, but they seem to use some custom Python version... At least I know it's based on commit 64113a4ba801126028505c50a7383f3e9df29573.
There is a patch file attached to this challenge, we can see that opcodes are shuffled in some way in opcode.py, and some options are specified in ceval.c.
As we are dealing with the compilation, we can focus on applying the change in ceval.c first.
We can compile Python (which is 3.11) in different ways depending on the operating system and the environment. For example, if we are using Windows 11, we can use PCbuild/build.bat to compile.
The compilation result is located in PCbuild/amd64, from where we can see python.exe, pythonw.exe, etc.
Understanding the opcode shuffling
Refer to patch again, the shuffling procedure is done by the following lines:
This means opcodes in range [1, 90) and [90, 200) are shuffled independently to build the mapping table, then operations will get their opcodes according to the corresponding indices in the appropriate table.
To prevent the original opcodes from distracting our attention when reversing, we change all the opcodes in opcode.py to 200 (which is not used), and define 200 single-byte operations named !0!, !1!, ..., !199!:
+ for i in range(200):+ def_op('!{}!'.format(i), i)- def_op('CACHE', 0)+ def_op('CACHE', 200)- def_op('POP_TOP', 1)+ def_op('POP_TOP', 200)- def_op('PUSH_NULL', 2)+ def_op('PUSH_NULL', 200)
...
- jrel_op('POP_JUMP_BACKWARD_IF_TRUE', 176)+ jrel_op('POP_JUMP_BACKWARD_IF_TRUE', 200)
Viewing the assembly
Now we can try to parse the contents of x.pyc.
Same as recent Python Versions, Python 3.11 pyc files use 16 bytes as the header, followed by the serialized code object.
Here is the code to recursive print out the information about x.pyc:
and basically !0! opcodes appear at weird places. This is because if the opcodes are not used in the original Python, they are converted to 0 when deserializing. This also affects the offsets of the opcodes and leads to parsing failures.
(In the above case, the opcode after 171 is actually 150, which is not used in Python 3.11.)
How to make sure that this assumption is correct?
We can notice that dis is printing the serialized data based on co.co_code. We can compare the co_code with the actual x.pyc in Hex editor to find out that some bytes are different.
Luckily, the Code object still stores the untampered raw bytes in co._co_code_adaptive, and we can also ask dis to use that by setting adaptive=True.
After adding adaptive=True, it returns something like the following:
Name: <module>
Filename: /usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py
Argument count: 0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 2
Flags: 0x0
Constants:
0: 0
1: None
2: <code object ks at 0x0000015AEF049E90, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 5>
3: <code object cry at 0x0000015AEF098DF0, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 11>
4: <code object fail at 0x0000015AEF056560, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 18>
5: <code object game1 at 0x0000015AEEE6BDC0, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 23>
6: <code object game2 at 0x0000015AEF29F040, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 68>
7: <code object game3 at 0x0000015AEF2B42E0, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 96>
8: <code object main at 0x0000015AEF2B4640, file "/usr/local/google/home/xxxxxxxxx/yyyy/zzzzzzzzzzzzzzzz/xxxxxxxxxxx.py", line 123>
Names:
0: sys
1: random
2: time
3: ks
4: cry
5: fail
6: game1
7: game2
8: game3
9: main
0 !97! 0
1 2 !96! 0
4 !96! 1
6 !171! 0
8 !150! 0
2 10 !96! 0
12 !96! 1
14 !171! 1
16 !150! 1
3 18 !96! 0
20 !96! 1
22 !171! 2
24 !150! 2
5 26 !96! 2
28 !99! 0
30 !150! 3
...
However, we may sometimes see that some lines are incomplete if opcodes are invalid. Therefore, we can use a try-except block to wrap around the _get_instructions_bytes function in dis.py to make sure that every Instruction is printed:
cache_counter = _inline_cache_entries[deop]
+ try:
if arg is not None:
...
_, argrepr = _nb_ops[arg]
+ except:+ pass
yield Instruction(_all_opname[op], op,
How to test
We keep another unmodified version of 3.11 to observe the result of different codes after serialization, using the same recursive code stated above.
For example, we can create a file test.py like this:
importrandomdefa(x=4):
print(x)
a(5)
Launch python from the same directly, and import test as a module:
importtest
Then we should be able to find the corresponding pyc file in __pycache__/test.cpython-311.pyc.
We can modify the path x.pyc in the testing source code to __pycache__/test.cpython-311.pyc, and run the script in the unmodified Python 3.11 to check the corresponding opcodes.
Recovering the opcodes
We can deduce some opcodes from very basic observations:
Most functions start with RESUME opcode, except generator functions (with GENERATOR in Flags), which starts by RETURN_GENERATOR, POP_TOP, then RESUME.
The last opcode is most likely to be RETURN_VALUE.
By comparing back to our outputs to x.pyc, we can confirm that RESUME is 97, RETURN_GENERATOR is 3, POP_TOP is 1, and RETURN_VALUE is 49.
For the return value, if there are only very few opcodes in the same line, we can assume that it is either returning None (no explicit return), a constant, or the value of a variable.
This corresponds to LOAD_CONST, and LOAD_FAST. Also, we can follow the consti / var_num to deduce the correct mapping as it should not be referencing an index exceeding the length of such values.
Another observation is that import statements go like LOAD_CONST → LOAD_CONST → IMPORT_NAME → STORE_FAST.
We now know that (96, 96, 171, 150) is very likely to be (LOAD_CONST, LOAD_CONST, IMPORT_NAME, STORE_NAME), while 98 should be LOAD_FAST instead.
def_op('LOAD_CONST', 96) # Index in const list
hasconst.append(96)
name_op('IMPORT_NAME', 171) # Index in name list
name_op('STORE_NAME', 150) # Index in name list
def_op('LOAD_FAST', 98) # Local variable number
haslocal.append(98)
Since we have recovered LOAD_CONST and LOAD_NAME, all variables can be immediately shown in dis output. This helps a lot in later reversing.
Notice that lines 81 to 85 are almost identical other than the string constants it loads (e.g. sum, difference, ...) and the argument to !198!.
As it's asking about the result of some mathematical operations, we can deduce that 198 is actually BINARY_OP.
We also cross-checked this with some testing codes using +, -, *, // and %, and confirmed that their ops are 0, 10, 5, 2, 6 respectively, which perfectly matches the above values.
def_op('BINARY_OP', 198, 1)
Since BINARY_OP is a major part of the code, the logic becomes far clearer after recovering this opcode.
Also, we can completely recover function w (a nested function in game1) now:
123defmain():
124print('Pass 3 tests to prove your worth!')
125seed='seed:'126127seed+=game1() +":"128print(seed)
129130seed+=game2() +":"131print(seed)
132133seed+=game3()
134print(seed)
135print()
136print("You can drive to work, know some maths and can type fast. You're hired!")
137print('Your sign-on bonus:', cry(b'\xa0?n\xa5\x7f)\x1f6Jvh\x95\xcc!\x1e\x95\x996a\x11\xf6OV\x88\xc1\x9f\xde\xb50\x9d\xae\x14\xde\x18YHI\xd8\xd5\x90\x8a\x181l\xb0\x16^O;]', seed).decode())
, where 27 corresponds to NOP which is the placeholder for the while-loop. 123 at the end is then JUMP_BACKWARD that jumps back to the place just after line 7.
As it is a generator function, 69 should correspond to the YIELD_VALUE operation.
def_op('NOP', 27)
def_op('YIELD_VALUE', 69)
jrel_op('JUMP_BACKWARD', 200) # Number of words to skip (backwards)
The cry function which is used to decrypt the flag at the end is also quite trivial to reverse now:
81ifx=='sum': r=a+b82elifx=='difference': r=a-b83elifx=='product': r=a*b84elifx=='ratio': r=a//b85elifx=='remainder from division': r=a%b
def_op('COMPARE_OP', 104, 2) # Comparison operator
hascompare.append(104)
jrel_op('POP_JUMP_FORWARD_IF_FALSE', 186)
jrel_op('JUMP_FORWARD', 94) # Number of words to skip
From here, all opcodes that appear in the source code have already been recovered. Therefore, it is now straightforward to rewrite everything back to readable Python codes (See rev.py below)
Unless you recovered the random seed used, it is not likely that you can recover the remaining opcodes (in /tmp/opcode_map).
As it is actually a game that generates a key by the actions during the game to recover the flag, we can actually calculate the flag directly instead of playing the game.
The modified file opcode.py with known opcodes is attached below.
Also, we have modified dis.py to show more bytes to facilitate the debugging, especially to recover those opcodes like LOAD_GLOBAL occupying many bytes, which prints out like this:
Math quiz time!
What is the sum of 12 and 5?
17
Correct!
What is the difference of 45 and 14?
31
Correct!
What is the product of 8 and 9?
72
Correct!
What is the ratio of 18 and 6?
3
Correct!
What is the remainder from division of 23 and 7?
2
Correct!
This is quite straightforward, and the appended seed is _17_31_72_3_2_.
Game 3: "can type fast"
Speed typing game.
20.00 seconds left.
Text: Because
Because
18.45 seconds left.
Text: Because of
of
17.32 seconds left.
Text: Because of its
its
16.23 seconds left.
Text: Because of its performance
performance
13.95 seconds left.
Text: Because of its performance advantage,
The typing speed requirement is too strict that you can just calculate the seed, or modify the time limit to solve this game.
The seed here is _BECAUSE_OF_ITS_PERFORMANCE_ADVANTAGE,_TODAY_MANY_LANGUAGE_IMPLEMENTATIONS_EXECUTE_A_PROGRAM_IN_TWO_PHASES,_FIRST_COMPILING_THE_SOURCE_CODE_INTO_BYTECODE,_AND_THEN_PASSING_THE_BYTECODE_TO_THE_VIRTUAL_MACHINE._
Getting the flag
This means the final seed is seed:sssddwwddwddsssdssaaawwssaaaassddddddd:_17_31_72_3_2_:_BECAUSE_OF_ITS_PERFORMANCE_ADVANTAGE,_TODAY_MANY_LANGUAGE_IMPLEMENTATIONS_EXECUTE_A_PROGRAM_IN_TWO_PHASES,_FIRST_COMPILING_THE_SOURCE_CODE_INTO_BYTECODE,_AND_THEN_PASSING_THE_BYTECODE_TO_THE_VIRTUAL_MACHINE._
Use this seed to decrypt the flag, which is CTF{4t_l3ast_1t_w4s_n0t_4n_x86_opc0d3_p3rmut4tion}.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters