# How To Load The HIBP Pwned Passwords Database Into MongoDB [NIST](https://pages.nist.gov/800-63-3/sp800-63b.html) recommends that when users are trying to set a password you should reject those that are commonly used or compromised: When processing requests to establish and change memorized secrets, verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised. But how do you know what are the compromised passwords? Luckily [Troy Hunter](https://www.troyhunt.com/ive-just-launched-pwned-passwords-version-2/) put a lot of effort into building the "Have I Been Pwned (HIBP)" database with the SHA1 hashes of 501,636,842 passwords that have been compromised on the internet. Sweet. This means that to prevent a user setting a compromised password like `P@ssword` you can look it up on a public HIBP service such as [this one](https://haveibeenpwned.com/Passwords) and reject it. If you are running a security sensitive service it is probably a bad idea to make a call to a public password hash lookup service. To get around that the public Pwned Password API at https://haveibeenpwned.com/API/v2#PwnedPasswords has you send the first 5 chars of the hash and they respond with all the matches. That might be slow or return a lot of data or be offline. So you might want to load the HIBP database into a private store such as MongoDB and check the SHA1 hash against that authorative store. You can then use a private secure API to your own MongoDB and just do an exact match SHA1 check which will be fast and since it is on your infrastructure you can ensure that it is made highly available. There is another gist on this site for loading into Redis. Redis needs to fit in memory so would be expensive to run but that gist has a suggestion of how to hold the most used passwords in redis for fast checks before doing a slower check against all the hashs in mongo. ## Prerequisites These instructions assume that you drive a mac but should be as straightforward on linux. * Over 50Gi of disk (uncompressed the database is 33Gi then add to that the compressed 8Gi ) * Homebrew to install command line tools * `brew install aria2` for the `aria2c` bit torrent download client * `brew install p7zip` for the `7za` tool to uncompress a the `.txt.7z` file * A mongo database with sufficent disk space. ## Steps Note that it took an hour to download the 8Gi torrent on my broadband. The mongoimport command assumes that your mongod server is listing locally on the default port. If not you can pass commandline args to [mongoimport](https://docs.mongodb.com/manual/reference/program/mongoimport/) below to connect to a remote server. 1. `aria2c https://downloads.pwnedpasswords.com/passwords/pwned-passwords-2.0.txt.7z.torrent` 1. `7za x -so pwned-passwords-2.0.txt.7z | sed 's/:/,/g' | mongoimport --fields "_id.binary(base64),c.int32()" --columnsHaveTypes --db hibp --collection pwndpsswds --type csv` If you login and query the collection it looks something like: ``` > db.pwndpsswds.find() { "_id" : BinData(0,"5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8"), "c" : 3303003 } { "_id" : BinData(0,"3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D"), "c" : 2900049 } { "_id" : BinData(0,"7C222FB2927D828AF22F592134E8932480637C0D"), "c" : 2680521 } { "_id" : BinData(0,"6367C48DD193D56EA7B0BAAD25B19455E529F5EE"), "c" : 2670319 } { "_id" : BinData(0,"E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D"), "c" : 2310111 } ``` Where the primary key `_id` is stored as a binary byte format to reduce the storage size compared to storing a string. That means that to query by the pk you need to do a little bit of work to conver the string base64 SHA1 into a BinData type. You should test your query solution against known passwords such as `P@ssword` so that you don't get false negatives. ## References * [mongoimport](https://docs.mongodb.com/manual/reference/program/mongoimport/) * [I've Just Launched "Pwned Passwords" V2](https://www.troyhunt.com/ive-just-launched-pwned-passwords-version-2/) * [Pwned Passwords](https://haveibeenpwned.com/Passwords)