You have a cookie that stores a client's session id. To validate the session you probably need to check it against a server. An attacker might exploit this. E.g. with the big list of naughty strings [^0], or with a big string crafted for the occasion.
We're going to look at one such example and try to find a way to prevent these attacks from becoming a problem.
A sample client session cookie might look like this:
s:98840c37-438d-4696-81aa-cc98857935e9.fitd2ove0zc5izs61r618p9xrs2se6bmi
- a fixed string,
- a
:
, - a uuidv4,
- a
.
and - some junk.
For #reasons you only need the uuidv4 from this cookie to validate the customer's session on the server.
As the uuidv4 is neatly between a :
and a .
a regular expression that susses out anything
sandwiched between those looks reasonable at first glance: /:(.*)\./
. You test it and presto:
const cookieValue = `secret:98840c37-438d-4696-81aa-cc98857935e9.fitd2ove0zc5izs61r618p9xrs2se6bmi`;
console.dir(cookieValue.match(/:(.*)\./));
// [
// ':98840c37-438d-4696-81aa-cc98857935e9.',
// '98840c37-438d-4696-81aa-cc98857935e9', <== whoop whoop, we extracted a uuid !!
// index: 6,
// input: 's:98840c37-438d-4696-81aa-cc98857935e9.fitd2ove0zc5izs61r618p9xrs2se6bmi',
// groups: undefined
// ]
Is it a safe regular expression though? Security conscious as you are you run it through safe-regex a regular expression safety checker used by many eslint plugins, a.o. eslint-plugin-security and eslint-plugin-unicorn.
import safeRegex from 'safe-regex'
safeRegex(/:(.*)\./);
// true <= it's safe according to safe-regex's checks
There's various things wrong with the expression, though:
- it'll match the empty string (
:.
=>""
), which might trigger unhandled edge cases - it'll include stuffin the junk part of the cookie we have no control over
(
s:98840c37-438d-4696-81aa-cc98857935e9.fitd2.ove0.zc5.izs61r618p9xrs2se6bmi
=>98840c37-438d-4696-81aa-cc98857935e9.fitd2.ove0.zc5
), which might make that a valid session is flagged as an erroneous one
That's not the worst, though. Even though this expression doesn't do evidently unsecure things by itself, it still can hang up your server.
Your static code analyzer (if it's a good one) will flag the regular expression, and trace the string
back to where it came from. It might even explain that a string with many repetitions of :a
will take a
long time to run - it might even use words like polynomial.
Because of the way the regular expression engine works, with this particular
expression, it will evaluate every instance of :a
.
We can validate this by running the following code and looking at the timing numbers it emits, (on a 2017 macbook pro: 7.5 seconds)
// node --test --test-reporter spec the.test.js
import { test } from "node:test";
import { equal } from "node:assert";
const STRING_FROM_HELL = `${":a".repeat(50000)}`;
test("baseline - /:(.*)\\./i", () => {
const lMatch = STRING_FROM_HELL.match(/:(.*)\./i);
equal(Boolean(lMatch), false);
});
// ✔ baseline - /:(.*)\./i (7734.777159ms)
A first thought might be to attack the problem by making the pattern more specific, e.g.
by replacing .*
(match any character zero ore more times) with something that only matches
uuid's1:
const moreSpecificRE = /:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i
Against the initial attack with many repetitions of :a
, this indeed works wonders. With this
new pattern the test above takes about 1 millisecond - ~7000x faster. However Evil Alice never sleeps
and injecting 50k (or 500k) uuid's is just as easy as injecting 50k :a
's. It turns out that 50k uuid's
make the regex run for 18ms, and when you crank up the repetitions even higher, that number will rise
as well, burning your precious AWS budget.
Being more specific evidently helps a bit, but doesn't attack the root cause, which is the regex engine repeatedly trying to match. By requiring it only matches from the start of the string, the problem should be gone:
const moreSpecificRE = /^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i
Against 50k :a
's this runs in ~0.2 milliseconds, against 50k uuid's ~0.3 milliseconds, which
is much more reasonable.
This gist includes a JavaScript test file you can run with node's native test runner (node 18 or up):
node --test --test-reporter spec polynomial.test.mjs
Which will yield something like this for output:
✔ alternative 1 - /^s:(.*)\./i (1.881229ms)
✔ alternative 2 - /^s:(.+)\./i (0.189199ms)
✔ alternative 3 - /^s:([^.]+)\./i (0.175063ms)
✔ alternative 4 - /^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i (0.198885ms)
✔ alternative 5 - /:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i (1.190747ms)
✔ alternative 5 against a 50k UUID's - /:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i (18.42299ms)
✔ alternative 4 against 50k UUID's - /^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i (0.35321ms)
✔ baseline - /:(.*)\./i (7818.769965ms)
ℹ tests 8
ℹ suites 0
ℹ pass 8
ℹ fail 0
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 7942.484351
Footnotes
-
This expression could be even more specific as not all hex characters can occur on any position in a uuidv4. E.g. the second group of four will always start with a
4
- if you search for uuidv4 regex with your favourite search engine more elaborate ones will spring up like daisies in the summer. ↩