We made a fake flag generator just for this CTF! It can generate flags that look like the real one! Can you find the real flag?
nc 0.cloud.chals.io 34879
We're given a Python server that looks like this:
#!/usr/bin/env python3
import io
import random
import string
from collections import Counter, defaultdict
with open('flag.txt', 'r') as f:
flag = f.read()
def generate_fake_flag():
return 'ictf{' + ''.join([
random.choice(string.ascii_lowercase + string.digits + '_-') for _ in range(20)
]) + '}'
def derive_markov_model(texts):
probabilities = defaultdict(Counter)
for text in texts:
for a, b in zip(text[:-1], text[1:]):
probabilities[a][b] += 1
return probabilities
def predict_next_char(model, prefix):
if not prefix:
prefix = 'ictf{'
last_char = prefix[-1]
if last_char not in model:
return random.choice(string.ascii_lowercase + '_')
else:
options = model[last_char]
options_str = ''.join(c * cnt for c, cnt in options.items())
return random.choice(options_str)
def finish_flag(model, prefix):
flag = prefix
while flag[-1] != '}' and len(flag) < 30:
flag += predict_next_char(model, flag)
if flag[-1] != '}':
flag += '}'
return flag
def main():
num_datapoints = int(input("How many training samples would you like?\n"))
percent_real = int(input("What percentage of training flags would you like to be included to make the flags look real? (max 20%)\n"))
assert 0 <= percent_real <= 20
num_times_real = int(num_datapoints * (percent_real / 100))
num_times_fake = num_datapoints - num_times_real
dataset = [flag] * num_times_real + [generate_fake_flag() for _ in range(num_times_fake)]
print("Understood, training the model...")
# import ipdb; ipdb.set_trace()
model = derive_markov_model(dataset)
print("Done! Now, how many flags would you like to generate?")
num_flags = int(input())
if num_flags > 10000:
print("Sorry, that's too many flags.")
return
print("Here you go:")
for _ in range(num_flags):
print(finish_flag(model, 'ictf{'))
print("Thanks for using our service! Now, if you were by some incredible chance able to find the flag, you have one chance to confirm that it's correct.")
if input("Flag: ") == flag:
print("Correct!")
else:
print("Incorrect!")
if __name__ == '__main__':
main()
Based on the challenge title and the provided source code, it looks like the flag is loaded into a Markov chain that we get to query generated flags from.
We can connect to the server and sample 10,000 flags from the Markov chain output, then analyze weights for each character in the chain:
import collections
markov = {}
with open("out.txt") as f:
lines = [x.strip()[5:] for x in f.readlines()]
print(lines)
for line in lines:
for i in range(len(line) - 1):
if not (line[i] in markov):
markov[line[i]] = collections.Counter()
markov[line[i]][line[i + 1]] += 1
for x, v in markov.items():
print(f"{x}: , {v.most_common(5)}")
(assuming generated flags are saved in out.txt
)

Starting from i
(because of ictf
) and following the most probable next letter, we get
ictf{ma
Once we reach a
, however, there seem to be 3 options with roughly similar probability. Before resolving this, we can work
backwards from }
to get
ictf{ma ... aR3s}
Starting from n
, the sequence is
n_N1
with g
or a
potentially following. Starting from g
, the sequence is
gh7ma
So putting the 3 sequences together to resemble a roughly english word, we can infer the end of the flag looks like
ictf{ma ... n_N1gh7maR3s}
Then, because R
cannot follow ma
(as that would make the flag ictf{maR3s}
, ending it early and rendering it nonsensical)
and because n
likely does not immediately follow ma
(as that would make the flag ictf{man_N1gh7maR3s}
, which is a bit suspicious),
the next character must then be r
, making the sequence
ictf{mark0v1 ... n_N1gh7maR3s}
Slotting an a
into the final slot completes the flag as
ictf{mark0v1an_N1gh7maR3s}