-
-
Save emboss/1597215 to your computer and use it in GitHub Desktop.
I looked into http://grothoff.org/christian/esed.pdf when I found that | |
they produce a RIPEMD-160 hash to generate a key from 128 bits and take | |
the rest for the IV. | |
You could use a similar approach to generate key and IV where the IV is | |
independent (somewhat) of the key by using a non-salted key derivation | |
function that is normally used in Diffie-Hellman-like Key Exchange | |
protocols. They are used to generate arbitrary-length output from an | |
initial fixed-size output. (see the KDFs in http://www.di-mgt.com.au/cryptoKDFs.html) | |
The salt is not needed in our case, since the underlying data (the | |
file hash) probably provides enough entropy/randomness. | |
So to keep the key out of IV generation you could generate for example | |
48 bytes, 32 to be used for the key and the rest would serve as the IV. | |
The "shared secret" input for the KDF would simply be the SHA-256 hash | |
of the file, computed like before. | |
Although I think your scheme is fine, I could think of one possibility | |
how somebody could exploit the fact that the key is involved in | |
IV generation: They could use an attack as described in | |
http://www.cs.berkeley.edu/~daw/my-posts/key-as-iv-broken - don't know | |
if it's actually feasible in this scenario, but let's assume it were - | |
then the attacker could obtain at least the plain IV, which right now is | |
the hashed key. | |
Now since the hash is not salted, it would probably be easier to brute-force | |
the SHA-256 hash than to brute-force the AES key itself, because verifying | |
the hash is the faster operation. That's why I think from a purist view the | |
"KDF thing" might make sense, but then again, brute-forcing SHA-256 is not | |
a trivial task, either :) | |
But enough nit-picking, congrats on this cool project!! | |
Yes, it's better because it puts more load on the attacker, but the salt is still a fixed value :)
I like the KDF thing because it reminds me of what you do in public key crypto key exchanges, where two parties derive a key from a shared, fixed-length secret, so I felt this was pretty similar to that situation. But no guarantees unless thousands of cryptographers look at it and approve of it :P
If you want to look at KDFs further, I can recommend NIST's introduction in http://csrc.nist.gov/publications/nistpubs/800-56A/SP800-56A_Revision1_Mar08-2007.pdf. The RSA PKCS#5 paper is also pretty good, although PBKDF2 introduced there requires a salt, but they have a lot of good background info.
I'm confused how KDF helps in a way that the method I outlined will not. Both of these approaches depend entirely on the entropy in the file in question. Shouldn't that make them equally susceptible to a rainbow-table style attack?
The more I think about this, the less I think using the system to exchange small files containing highly sensitive information is even interesting. Small, password/key-sized files can be directly exchanged using any number of secure mechanisms (e.g. pgp/gpg, OTR). The Cryptosphere is going to be doing this all the time by exchanging symmetric keys via asymmetric key crypto, and I'd like to provide built-in social mechanisms that allow users to securely exchange keys to files/directory structures with each other.
That said, even if this isn't a practical use case, we can still try to envision a potential attack on low-entropy files and try to design a system which would defend against it:
- Bob wants to send Alice a password stored as a file in the Cryptosphere
- Bob computes the key/iv for the file using whatever key derivation algorithm
- Bob encrypts the file, computes the public hash, and makes the file available to the system
- Bob, as part of the Cryptosphere's normal operation, decides to share this file with another peer, let's call him Steve
- Steve obtains the public ID of the file, which is hash(block_cipher(key_derivation(plaintext), iv_derivation(plaintext), plaintext))
Steve happens to have access to a large GPU farm and has already built a rainbow table by iterating through all short byte sequences, calculating the public ID using the algorithm above, and creating a large associative map of all public IDs to their corresponding plaintext.
Steve is able to take the public ID for the file he obtained from Bob, look it up in the rainbow table, and discover the corresponding plaintext.
To me, this is the real threat. However, if we were to incorporate a proof-of-work function into the process of deriving the original key and IV, this would severely hamper the ability for Steve to be able to generate rainbow tables. The amount of work done by the proof of work function could be incredibly significant... imagine something like BCrypt but with the work factor turned up much, much higher than a typical password.
Let's say we use BCrypt with a work factor that ensures the amount of computation required to successfully encrypt a file takes approximately 1 second on a modern computer (or 0.5 seconds * 2 as we'd be bcrypting both the key and the iv), and that Steve wants to build a rainbow table of all plaintext files which are 8 characters or smaller. It would take Steve...
http://www.wolframalpha.com/input/?i=2%5E64+seconds
584.9 billion years JUST to do the BCrypt calculations in order to derive the public IDs for all possible 8-character plaintexts (in the rather unlikely scenario he continues to use the same computer for 584.9 billion years)
There's still the issue of deriving the BCrypt salt, and perhaps I could use KDF for that, but to me the real solution here is to incorporate a proof-of-work problem when generating the key and iv used to encrypt the file.
What if an additional proof-of-effort component were added to prevent rainbow table-style attacks on small files?
key = bcrypt(hash(fixed_salt1 + plaintext))
iv = bcrypt(hash(fixed_salt2 + plaintext))
I suppose there's the issue of where to derive the bcrypt salt then :(
I will give KDF a more thorough look