Storing passwords securely

The best advice about passwords is: don't deal with passwords at all. Authenticate people through Google or Facebook or whatever. Not only it's one less password for people to remember, but also Google has better infastructure for authentication than you will have:

they detect suspicious logins
they support 2FA
and their account recovery process is more nuanced than “we'll send a link to your email, unless you don't have access to that email anymore, in which case you're screwed”)

However, if you can't, don't want to, or just disagree with this opinion for some reason or another, the next best advice about passwords is just “always use scrypt”. Unfortunately, without knowing why exactly scrypt exists it's easy to think “oh I'll be fine with not using scrypt for this small site” or “oh they use SHA256 in this codebase I inherited, well it's okay I guess”, and almost always such thoughts are wrong, so a more detailed explanation would probably do some good.

Let's go through the levels of security below “use scrypt” and show why they all are inadequate.

Storing passwords in plaintext

Mostly everybody reuses passwords. Thus, if somebody steals your database (which happens all the time and there are people who scan all existing sites for knowing vulnerabilities and will exploit them earlier than you even hear about them), users' Facebook accounts, email accounts and so on will be stolen too. This is a Pretty Bad Thing™.

Storing hashes of passwords

MD5, SHA-2, SHA-3, BLAKE2, it doesn't matter. With [rainbow tables][] it is possible to break most hashed passwords in seconds, so it's not any different from storing passwords in plain text. People who have long enough passwords to be unbreakable by rainbow tables likely don't reuse their passwords anyway, so their losses won't be as big.

rainbow tables

Note that the objection “but if a user has a weak enough password to be cracked easily, and they reuse it, they wouldn't their other accounts be stolen already?” is incorrect because Google and friends don't simply let you make thousands of login attempts per second. What you should be worried about isn't somebody learning that someone somewhere has password p@ssw0rd; the real problem is somebody learning that someone with email [email protected] has password p@ssw0rd. (Or even if you're not storing your users' emails, you could still be storing enough information that an attacker would be able to figure out users' emails from that.)

Storing hashes with salt

Salting a hash is done as follows: when you hash a password, you generate a random long string S and then you hash password ++ S, with S being stored in the database. When you verify entered password P, you take S from the database and check that hash (P ++ S) matches the stored hash.

This is inadequate as well. For almost all popular hashing algorithms there exist ASICs that can bruteforce passwords extremely quickly even without rainbow tables; it's a more expensive attack but still not expensive enough to prevent people who do large-scale vulnerability scans from using it.

Even complex passwords usually end up being broken by bruteforce because there exist large rulesets that collect all “creative” rules for creating passwords that people usually use. (In ten years they'll likely get good enough that creating a non-bruteforcable password by hand will become nearly impossible, thanks to machine learning advances.)

What's worse, even if the attacker doesn't have access to an ASIC it is still very cheap to check a small bunch of passwords against your database – say, a billion passwords – so when Linkedin or Last.fm or whatever other big site suffers a database leak, all your users are now vulnerable too.

:(

To summarize the previous sections: nowadays we have fast enough computers and big enough datasets of common passwords, book texts in all languages, etc, that unless a password is a 40-char random string, it will be stolen.

(An amusing sidenote: most people don't realize that if you do something xkcd-like to choose a password, but instead of picking four random words you pick something even remotely meaningful, your password becomes worthless. A Wikipedia dump is 50 GB of plaintext, i.e. only a billion strings of consecutive words. Checking a billion passwords hashed with SHA256 takes about 20 milliseconds when using several GPUs. So, if your password occurs anywhere in Wikipedia text, or in any book/song/film/etc that has its text publicly available, it will be broken.)

Scrypt

Memory-hard functions like scrypt are the only way to prevent this.

Scrypt works like a hash with salt, except that it's been designed to be extremely slow (say, one second to produce a hash) and use lots of memory (say, 32MB). RAM is much more expensive than CPU power, so even with ASICs you can't parallelize large-scale bruteforcing – and with hashing time of 1s, even simple passwords take hours to break (instead of milliseconds). Most attackers won't bother.

I recommend using the scrypt library in Haskell.

Here's how you can hash a password (the salt will be generated automatically and stored along with the hash):

import Crypto.Scrypt
import qualified Data.Text as T

hashPassword :: Text -> EncryptedPass
hashPassword = encryptPassIO' . Pass . T.encodeUtf8

Hare's how you can verify a password:

verifyPassword :: Text -> EncryptedPass -> Bool
verifyPassword t e = verifyPass' (Pass (T.encodeUtf8 t)) e

(If you want to store bytestrings in your database, use getEncryptedPass.)

(Also, here's a real-life example of a /login handler using scrypt and Spock from a tiny game site I wrote: https://github.com/neongreen/hat/blob/master/src/Main.hs#L190-L201.)

neongreen/scrypt.md

Storing passwords securely

Storing passwords in plaintext

Storing hashes of passwords

Storing hashes with salt

:(

Scrypt