Last active
August 29, 2015 13:57
-
-
Save defuse/9907281 to your computer and use it in GitHub Desktop.
Random Characters to Random Bits
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Goal: | |
You're given a sequence of random alphanumeric characters (0-9a-zA-Z, 62 | |
possible characters), for example from a password generator. Convert it into | |
a sequence of random *bits*. | |
The output should have the property: | |
The alphanumeric character RNG can be distinguished from random if and | |
only if the alphanumeric character RNG, with the conversion algorithm | |
attached, can be distinguished from random. | |
Attempt #1: | |
output_lsb(n, b) { /* outputs the b least-significant bits of n */ } | |
while(1) { | |
// Get the next character from the alphanumeric generator. | |
c = get_next_alphanumeric_character(); | |
// Get the 'index of the character in the character set' | |
x = get_character_number(c); // a=0, b=1, ... A=26 ... 0=52 ... 9=62 | |
if (x < 32) { | |
// x is random in [0,31] | |
output_lsb(x, 5); | |
} else { | |
x = x - 32; // x is random in [0,29] | |
if (x < 16) { | |
// x is random in [0,15] | |
output_lsb(x, 4); | |
} else { | |
x = x - 16; // x is random in [0,13] | |
if (x < 8) { | |
// x is random in [0,7] | |
output_lsb(x, 3); | |
} else { | |
x = x - 8; // x is random in [0,5] | |
if (x < 4) { | |
// x is random in [0,3] | |
output_lsb(x, 2); | |
} else { | |
x = x - 4; // x is random in [0,1] | |
output_lsb(x, 1) | |
} | |
} | |
} | |
} | |
} | |
// Obviously you'd implement the inner part as a loop, but I unrolled it to | |
// make it easier to understand. | |
Analysis: | |
It is NOT true for this algorithm that if the output is secure then the | |
underlying generator is secure. | |
Proof: | |
Counterexample: If the underlying generator always returned a random | |
character between between 0 and 31 (a and F), the output of this | |
algorithm would be random, but the underlying generator clearly isn't. | |
(Note: This is only true if you see the output stream as a whole, and | |
can't tell which branches were taken by the algorithm. Otherwise you | |
could tell if you always saw 5 bits getting output at a time). | |
Hint: Maybe we can make this true by explicitly including which | |
branch was taken in the output? | |
A trivial way to guarantee it is to make it easy to make it possible | |
to invert the output stream back into the input characters, except | |
I doubt this is possible because it would imply a way to generate | |
random alphanumeric characters more efficiently than | |
try-and-throw-away. | |
(Question: Does the property in general imply that too?) | |
But maybe it is... | |
There are 5 calls to output_lsb(), with probabilities: | |
[ 0,31] | 32/62 | 0.51613 | |
[32,47] | 16/62 | 0.25806 | |
[48,55] | 8/62 | 0.12903 | |
[56,59] | 4/62 | 0.06452 | |
[60,61] | 2/62 | 0.03226 | |
This seems well suited for a Huffman coding.... | |
Wait... if you collect 62 samples, counting the number for | |
each branch, the expected values are 32, 16, 8, 4, 2. Now, | |
by some kind of symmetry argument, it should be 50/50 that | |
the actual value is above/below the expected. So, after 62 | |
samples, you could output 5 bits, one for each branch, | |
that's 0 if actual < expected, 1 if actual > expected. | |
But then what to do about actual == expected? | |
Ugh... seems like this is going to be an infinite | |
regress of things-not-being-powers-of-two. | |
Conjecture: It IS true that if the underlying generator is secure then the | |
output of this algorithm is secure. | |
Proof: | |
Assumption: If 0 < C < N is some constant and X is a random number | |
between 0 and N (inclusive), then: | |
if X < C | |
X is a random number between 0 and C-1 | |
otherwise | |
X-C is a random number between 0 and N-C | |
... then just follow the logic in the comments above, and you'll see | |
that all the output_lsb() are outputing random bits. | |
Notes | |
It's impossible for any *finite* sequence of bits, for the following reason: | |
The algorithm outputs a discrete number of bits, and gets a discrete number | |
of characters from the underlying generator. Say it has output X bits from | |
K characters. Then one of the following is true: | |
1. K * lg(62) > X | |
This says that the output is ambiguous. There are at least two | |
things the underlying RNG could have spat out to give the same | |
output, and the bias in the underlying RNG can lie there, without | |
there ever being a bias in the bit output. | |
So algorithm security does not imply underling RNG security. | |
(Note: Did I just unintentionally switch from computationally | |
secure to information-theoretically secure?) | |
2. K * lg(62) < X | |
This says that X is non-random because it only contains K*lg(62) | |
bits of entropy yet is actually more than that many bits. | |
So underling RNG security does not imply algorithm security. | |
(Note: It's not really an algorithm, since it never halts). | |
Proof: | |
(By contradiction) | |
Suppose K * lg(62) = X for some integers K and X. | |
Then lg(62) = X/K. | |
Then 62 = 2^(X/K). | |
Then 62^K = 2^(X/K)^K | |
Then 62^K = 2^X | |
This is impossible, since 2^X is not divisible by 31 (since | |
there is no exponent on the 31 in the prime factorization), but | |
62^K is, since 31 divides 62. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment