Skip to content

Instantly share code, notes, and snippets.

@ky28059
Last active December 9, 2023 00:09
Show Gist options
  • Save ky28059/e63c6d26f19301f2024b6231677b4bbd to your computer and use it in GitHub Desktop.
Save ky28059/e63c6d26f19301f2024b6231677b4bbd to your computer and use it in GitHub Desktop.

iCTF 2023 — escape_from_markov

We made a fake flag generator just for this CTF! It can generate flags that look like the real one! Can you find the real flag?

nc 0.cloud.chals.io 34879

We're given a Python server that looks like this:

#!/usr/bin/env python3

import io
import random
import string
from collections import Counter, defaultdict

with open('flag.txt', 'r') as f:
    flag = f.read()

def generate_fake_flag():
    return 'ictf{' + ''.join([
        random.choice(string.ascii_lowercase + string.digits + '_-') for _ in range(20)
    ]) + '}'

def derive_markov_model(texts):
    probabilities = defaultdict(Counter)
    for text in texts:
        for a, b in zip(text[:-1], text[1:]):
            probabilities[a][b] += 1

    return probabilities

def predict_next_char(model, prefix):
    if not prefix:
        prefix = 'ictf{'

    last_char = prefix[-1]
    if last_char not in model:
        return random.choice(string.ascii_lowercase + '_')
    else:
        options = model[last_char]
        options_str = ''.join(c * cnt for c, cnt in options.items())
        return random.choice(options_str)

def finish_flag(model, prefix):
    flag = prefix
    while flag[-1] != '}' and len(flag) < 30:
        flag += predict_next_char(model, flag)
    if flag[-1] != '}':
        flag += '}'
    return flag

def main():
    num_datapoints = int(input("How many training samples would you like?\n"))
    percent_real = int(input("What percentage of training flags would you like to be included to make the flags look real? (max 20%)\n"))
    assert 0 <= percent_real <= 20

    num_times_real = int(num_datapoints * (percent_real / 100))
    num_times_fake = num_datapoints - num_times_real

    dataset = [flag] * num_times_real + [generate_fake_flag() for _ in range(num_times_fake)]

    print("Understood, training the model...")
    # import ipdb; ipdb.set_trace()
    model = derive_markov_model(dataset)

    print("Done! Now, how many flags would you like to generate?")
    num_flags = int(input())
    if num_flags > 10000:
        print("Sorry, that's too many flags.")
        return
    print("Here you go:")
    for _ in range(num_flags):
        print(finish_flag(model, 'ictf{'))

    print("Thanks for using our service! Now, if you were by some incredible chance able to find the flag, you have one chance to confirm that it's correct.")
    if input("Flag: ") == flag:
        print("Correct!")
    else:
        print("Incorrect!")

if __name__ == '__main__':
    main()

Based on the challenge title and the provided source code, it looks like the flag is loaded into a Markov chain that we get to query generated flags from.

We can connect to the server and sample 10,000 flags from the Markov chain output, then analyze weights for each character in the chain:

import collections

markov = {}

with open("out.txt") as f:
    lines = [x.strip()[5:] for x in f.readlines()]

print(lines)
for line in lines:
    for i in range(len(line) - 1):
        if not (line[i] in markov):
            markov[line[i]] = collections.Counter()
        markov[line[i]][line[i + 1]] += 1

for x, v in markov.items():
    print(f"{x}: , {v.most_common(5)}")

(assuming generated flags are saved in out.txt)

Starting from i (because of ictf) and following the most probable next letter, we get

ictf{ma

Once we reach a, however, there seem to be 3 options with roughly similar probability. Before resolving this, we can work backwards from } to get

ictf{ma ... aR3s}

Starting from n, the sequence is

n_N1

with g or a potentially following. Starting from g, the sequence is

gh7ma

So putting the 3 sequences together to resemble a roughly english word, we can infer the end of the flag looks like

ictf{ma ... n_N1gh7maR3s}

Then, because R cannot follow ma (as that would make the flag ictf{maR3s}, ending it early and rendering it nonsensical) and because n likely does not immediately follow ma (as that would make the flag ictf{man_N1gh7maR3s}, which is a bit suspicious), the next character must then be r, making the sequence

ictf{mark0v1 ... n_N1gh7maR3s}

Slotting an a into the final slot completes the flag as

ictf{mark0v1an_N1gh7maR3s}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment