-
-
Save erichannell/ae187c6f6d9bb7e8212d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We are working with Comma-separated value (CSV) files, so we'll need that library." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import csv" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now, let's iterate over the rows in the file to see what data we find." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Password Username \n", | |
"c4$h money \n", | |
"this that \n", | |
"$ynbe$21 vhang_15 \n", | |
"$YBWVdau potato \n", | |
"car$ car \n", | |
"b0bb0b bob \n", | |
"Gaiu$ Baltar \n", | |
"$tarbuck BillAdama \n", | |
"Caprica LauraRoslin \n", | |
"five number6 \n" | |
] | |
} | |
], | |
"source": [ | |
"with open(\"10M.csv\", \"r\") as inf:\n", | |
" reader = csv.reader(inf)\n", | |
" reader.next()\n", | |
" print '{:<20} {:<20}'.format('Password', 'Username')\n", | |
" for i in range(10):\n", | |
" data = reader.next()\n", | |
" print '{:<20} {:<20}'.format(data[0], data[1])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Great, but what about augmenting this very simple data with something else. How about the length of the password & username." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Pass Len Password User Len Username \n", | |
"4 c4$h 5 money \n", | |
"4 this 4 that \n", | |
"8 $ynbe$21 8 vhang_15 \n", | |
"8 $YBWVdau 6 potato \n", | |
"4 car$ 3 car \n", | |
"6 b0bb0b 3 bob \n", | |
"5 Gaiu$ 6 Baltar \n", | |
"8 $tarbuck 9 BillAdama \n", | |
"7 Caprica 11 LauraRoslin \n", | |
"4 five 7 number6 \n" | |
] | |
} | |
], | |
"source": [ | |
"with open(\"10M.csv\", \"r\") as inf:\n", | |
" reader = csv.reader(inf)\n", | |
" reader.next()\n", | |
" print '{:<10} {:<20} {:<10} {:<20}'.format('Pass Len', 'Password', 'User Len', 'Username')\n", | |
" for i in range(10):\n", | |
" data = reader.next()\n", | |
" pwd = data[0]\n", | |
" usr = data[1]\n", | |
" pwd_len = len(pwd)\n", | |
" usr_len = len(usr)\n", | |
" print '{:<10} {:<20} {:<10} {:<20}'.format(pwd_len, pwd, usr_len, usr)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Perfect, but what else can we do? Unfortunately it looks like some people pick some really bad passwords. Let's see if we can tag the passwords that are simply English-language words. We'll use a library called \"enchant\" to do that." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import enchant" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now, let's check each password and username to see if it is in the English-language dictionary." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Pass Len Word? Password User Len Word? Username \n", | |
"4 0 c4$h 5 1 money \n", | |
"4 1 this 4 1 that \n", | |
"8 0 $ynbe$21 8 0 vhang_15 \n", | |
"8 0 $YBWVdau 6 1 potato \n", | |
"4 0 car$ 3 1 car \n", | |
"6 0 b0bb0b 3 1 bob \n", | |
"5 0 Gaiu$ 6 0 Baltar \n", | |
"8 0 $tarbuck 9 0 BillAdama \n", | |
"7 0 Caprica 11 0 LauraRoslin \n", | |
"4 1 five 7 0 number6 \n" | |
] | |
} | |
], | |
"source": [ | |
"d = enchant.Dict(\"en_US\")\n", | |
"\n", | |
"with open(\"10M.csv\", \"r\") as inf:\n", | |
" reader = csv.reader(inf)\n", | |
" reader.next()\n", | |
" print '{:<10} {:<5} {:<15} {:<10} {:<5} {:<20}'.format('Pass Len', 'Word?', 'Password', 'User Len', 'Word?', 'Username')\n", | |
" for i in range(10):\n", | |
" data = reader.next()\n", | |
" pwd = data[0]\n", | |
" usr = data[1]\n", | |
" pwd_len = len(pwd)\n", | |
" usr_len = len(usr)\n", | |
" pwd_word = d.check(pwd)\n", | |
" usr_word = d.check(usr)\n", | |
" print '{:<10} {:<5} {:<15} {:<10} {:<5} {:<20}'.format(pwd_len, pwd_word, pwd, usr_len, usr_word, usr)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"OK, some passwords are terrible, but some look pretty strong. Let's calculate the strength of each password using a library called \"passwordmeter\" which returns a score between 0 (terrible password) and 1 (incredibly strong password).\n", | |
"\n", | |
"Let's also check to see if the username string appears in the password string (not a great idea)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Pass Len Word? Password Strength user in pwd? User Len Word? Username \n", | |
"4 0 c4$h 0.36 0 5 1 money \n", | |
"4 1 this 0.1 0 4 1 that \n", | |
"8 0 $ynbe$21 0.4 0 8 0 vhang_15 \n", | |
"8 0 $YBWVdau 0.48 0 6 1 potato \n", | |
"4 0 car$ 0.24 1 3 1 car \n", | |
"6 0 b0bb0b 0.24 0 3 1 bob \n", | |
"5 0 Gaiu$ 0.44 0 6 0 Baltar \n", | |
"8 0 $tarbuck 0.26 0 9 0 BillAdama \n", | |
"7 0 Caprica 0.35 0 11 0 LauraRoslin \n", | |
"4 1 five 0.1 0 7 0 number6 \n" | |
] | |
} | |
], | |
"source": [ | |
"import passwordmeter\n", | |
"\n", | |
"d = enchant.Dict(\"en_US\")\n", | |
"\n", | |
"with open(\"10M.csv\", \"r\") as inf:\n", | |
" reader = csv.reader(inf)\n", | |
" reader.next()\n", | |
" print '{:<10} {:<5} {:<15} {:<10} {:<15} {:<10} {:<5} {:<20}'.format(\n", | |
" 'Pass Len', 'Word?', 'Password', 'Strength', 'user in pwd?', 'User Len', 'Word?', 'Username')\n", | |
" for i in range(10):\n", | |
" data = reader.next()\n", | |
" pwd = data[0]\n", | |
" usr = data[1]\n", | |
" pwd_len = len(pwd)\n", | |
" usr_len = len(usr)\n", | |
" pwd_word = d.check(pwd)\n", | |
" usr_word = d.check(usr)\n", | |
" usr_in_pwd = pwd.find(usr) != -1\n", | |
" pwd_strength = round(passwordmeter.test(pwd)[0],2)\n", | |
" print '{:<10} {:<5} {:<15} {:<15} {:<10} {:<10} {:<5} {:<20}'.format(\n", | |
" pwd_len, pwd_word, pwd, pwd_strength, usr_in_pwd, usr_len, usr_word, usr)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now... to run this on 10M rows of data." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.7" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment