Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save eclecticmiraclecat/522c46f535a9c027e04e2af3eefa4b27 to your computer and use it in GitHub Desktop.
Save eclecticmiraclecat/522c46f535a9c027e04e2af3eefa4b27 to your computer and use it in GitHub Desktop.

Lesson 1: Building Foundational Python Skills for Data Analytics

Getting Setup

Setup a virtual environment:

$ python3.6 -m venv modernpython
$ source modernpython/bin/activate

Install the packages used in the course:

(modernpython) $ pyflakes
(modernpython) $ bottle
(modernpython) $ pytest
(modernpython) $ hypothesis
(modernpython) $ mypy

Resampling

Big Idea:

Statistics modeled in a program are easier to get right and understand than using a formulaic approach. It is also extends to more complicated situations that classic formulas.

Topics to Prepare for Resampling

  • F-strings
  • Counter(), most_common, elements
  • Statistics
  • Random: seed gauss triangular expovariate choice choices sample shuffle
  • Review list concatenation, slicing, count/index, sorted()
  • Review lambda expressions and chained comparisons

F-strings

In [1]: #    %-formatting    .format()    f''

In [2]: x = 10

In [3]: print('The answer is %d today' % x)
The answer is 10 today

In [4]: print('The answer is {0} today'.format(x))
The answer is 10 today

In [5]: print('The answer is {x} today'.format(x=x))
The answer is 10 today

In [6]: print(f'The answer is {x} today')
The answer is 10 today

In [7]: print(f'The answer is {x :08d} today')
The answer is 00000010 today

In [8]: print(f'The answer is {x ** 2 :08d} today')
The answer is 00000100 today

In [9]: type(x)
Out[9]: int

In [10]: type(x).__name__
Out[10]: 'int'

In [11]: raise ValueError(f"Expected {x!r} to a float not a {type(x).__name__}")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-b9152af534d8> in <module>
----> 1 raise ValueError(f"Expected {x!r} to a float not a {type(x).__name__}")

ValueError: Expected 10 to a float not a int

Counter()

In [12]: from collections import Counter

In [13]: d = {}

In [14]: d['dragons']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-14-c67452f26063> in <module>
----> 1 d['dragons']

KeyError: 'dragons'

In [15]: d = Counter()

In [16]: d['dragons']
Out[16]: 0

In [17]: d['dragons'] += 1

In [18]: d
Out[18]: Counter({'dragons': 1})

In [19]: 'red green red blue red blue green'.split()
Out[19]: ['red', 'green', 'red', 'blue', 'red', 'blue', 'green']

In [20]: Counter('red green red blue red blue green'.split())
Out[20]: Counter({'red': 3, 'green': 2, 'blue': 2})

In [21]: c = Counter('red green red blue red blue green'.split())

In [22]: c.most_common(1)
Out[22]: [('red', 3)]

In [23]: c.most_common(2)
Out[23]: [('red', 3), ('green', 2)]

In [24]: c.elements()
Out[24]: <itertools.chain at 0x7f5db3429e80>

In [25]: list(c.elements())
Out[25]: ['red', 'red', 'red', 'green', 'green', 'blue', 'blue']

In [26]: list(c)
Out[26]: ['red', 'green', 'blue']

In [27]: list(c.values())
Out[27]: [3, 2, 2]

In [28]: list(c.items())
Out[28]: [('red', 3), ('green', 2), ('blue', 2)]

In [29]: list(c.elements())
Out[29]: ['red', 'red', 'red', 'green', 'green', 'blue', 'blue']

Statistics

In [30]: from statistics import mean, median, mode, stdev, pstdev

In [31]: mean([50, 52, 53])
Out[31]: 51.666666666666664

In [33]: median([51, 50, 52, 53])
Out[33]: 51.5

In [34]: mode([51, 50, 52, 53, 51, 51])
Out[34]: 51

In [35]: stdev([51, 50, 52, 53, 51, 51])
Out[35]: 1.0327955589886444

In [36]: pstdev([51, 50, 52, 53, 51, 51])
Out[36]: 0.9428090415820634

Review list

In [37]: s = [10, 20, 30]

In [38]: t = [40, 50, 60]

In [39]: u = s + t

In [40]: u
Out[40]: [10, 20, 30, 40, 50, 60]

In [41]: u[:2]
Out[41]: [10, 20]

In [42]: u[-2:]
Out[42]: [50, 60]

In [43]: u[:2] + u[-2:]
Out[43]: [10, 20, 50, 60]

In [45]: s = 'abracadabra'

In [46]: s.index('c')
Out[46]: 4

In [47]: s.count('c')
Out[47]: 1

In [48]: s.count('a')
Out[48]: 5

sorted()

In [49]: s = [10, 5, 70, 2]

In [50]: s.sort()

In [51]: s
Out[51]: [2, 5, 10, 70]

In [52]: s = [10, 5, 70, 2]

In [53]: t = sorted(s)

In [54]: s
Out[54]: [10, 5, 70, 2]

In [55]: t
Out[55]: [2, 5, 10, 70]

In [56]: sorted('cat')
Out[56]: ['a', 'c', 't']

lambda

In [57]: #    lambda -> partial, itemgetter, attgetter, ..

In [58]: #      ^--- make function()

In [59]: #      ^--- make a computation in the future

In [60]: lambda x: x**2
Out[60]: <function __main__.<lambda>(x)>

In [61]: (lambda x: x**2)(5)
Out[61]: 25

In [62]: 100 + (lambda x: x**2)(5) + 50
Out[62]: 175

In [63]: f = lambda x, y: 3 * x + y

In [64]: f(3, 8)
Out[64]: 17

In [65]: x = 10

In [66]: y = 20

In [67]: f = lambda : x ** y

In [68]: f()
Out[68]: 100000000000000000000

Chained Comparisons

In [69]: x = 15

In [70]: x > 6
Out[70]: True

In [71]: x < 10
Out[71]: False

In [72]: x > 6 and x < 20
Out[72]: True

In [73]: 6 < x < 20
Out[73]: True

In [74]: # Chained comparisons

Random

In [1]: from random import *

In [2]: random()
Out[2]: 0.4367594986905675

In [3]: seed(8675309)

In [4]: random()
Out[4]: 0.40224696110279223

In [5]: random()
Out[5]: 0.5102471779215914

In [8]: seed(8675309)

In [9]: random()
Out[9]: 0.40224696110279223

In [10]: random()
Out[10]: 0.5102471779215914

In [11]: from random import choice, choices, sample, shuffle

In [12]: outcomes = ['win', 'lose', 'draw', 'play again', 'double win']

In [13]: choice(outcomes)
Out[13]: 'double win'

In [14]: choice(outcomes)
Out[14]: 'draw'

In [15]: choice(outcomes)
Out[15]: 'lose'

In [16]: choices(outcomes, k=5)
Out[16]: ['play again', 'lose', 'double win', 'double win', 'draw']

In [18]: from collections import Counter

In [20]: Counter(choices(outcomes, k=10))
Out[20]: Counter({'win': 1, 'play again': 2, 'lose': 3, 'draw': 2, 'double win': 2})

In [21]: Counter(choices(outcomes, k=10_000))
Out[21]: 
Counter({'play again': 2014,
         'double win': 1995,
         'win': 2048,
         'lose': 1970,
         'draw': 1973})

In [22]: Counter(choices(outcomes, [5, 4, 3, 2, 1], k=10_000))
Out[22]: 
Counter({'double win': 641,
         'play again': 1339,
         'lose': 2706,
         'draw': 2001,
         'win': 3313})

In [23]: outcomes
Out[23]: ['win', 'lose', 'draw', 'play again', 'double win']

In [24]: shuffle(outcomes)

In [25]: outcomes
Out[25]: ['win', 'lose', 'double win', 'draw', 'play again']

In [26]: choices(outcomes, k=5)
Out[26]: ['win', 'win', 'double win', 'draw', 'play again']

In [28]: sample(outcomes, k=4)
Out[28]: ['win', 'play again', 'draw', 'lose']

In [29]: sample(outcomes, k=4)
Out[29]: ['double win', 'lose', 'draw', 'win']

In [30]: # generate lottery number

In [31]: sorted(sample(range(1, 57), k=6))
Out[31]: [9, 11, 16, 23, 42, 49]

Lesson 2: Analyzing Data Using Simulations and Resampling

Examples

  • Six roulette wheel spins -> choices with weighting

  • Deal 20 playing cards without replacements (16 tens, 36 low) -> Counter, elements, sample, list.count

  • 5 our more head from 7 spins of a biased coin -> lambda, choices, list.count

  • Probability that the median of 5 samples falls a middle quartile -> chained comparison, choices from a range

  • Bootstrapping to estimate the confidence interval on a sample of data -> sorted, mean, choices

  • Statistical significance of the difference of two means -> shuffle slicing mean

  • Single server queue simulation -> expovariate gauss mean median stdev conditional-expressions

Modules: random, statistics, collections

Six roulette wheel spins

In [32]: from random import *

In [33]: from statistics import *

In [34]: from collections import *

In [35]: # Six roulette wheels -- 18 red 18 balck 2 greens

In [36]: choice(['red', 'red', 'red', 'black', 'black', 'black', 'green'])
Out[36]: 'red'

In [37]: choice(['red', 'red', 'red', 'black', 'black', 'black', 'green'])
Out[37]: 'black'

In [38]: ['red'] * 18
Out[38]: 
['red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red',
 'red']

In [39]: choice(['red'] * 18 + ['black'] * 18 + ['green'] * 2)
Out[39]: 'black'

In [40]: population = ['red'] * 18 + ['black'] * 18 + ['green'] * 2

In [41]: choice(population)
Out[41]: 'red'

In [42]: [choice(population) for i in range(6)]
Out[42]: ['red', 'black', 'black', 'black', 'red', 'red']

In [43]: Counter([choice(population) for i in range(6)])
Out[43]: Counter({'red': 5, 'black': 1})

In [44]: choices(population, k=6)
Out[44]: ['black', 'red', 'black', 'black', 'red', 'black']

In [45]: Counter(choices(population, k=6))
Out[45]: Counter({'red': 1, 'black': 5})

In [46]: 

In [46]: Counter(choices(['red', 'black', 'green'], [18, 18, 2], k=6))
Out[46]: Counter({'red': 2, 'black': 4})

In [47]: Counter(choices(['red', 'black', 'green'], [18, 18, 2], k=6))
Out[47]: Counter({'red': 5, 'green': 1})

In [48]: # ^---- This is the big idea with little code

Deal 20 playing cards without replacements (16 tens, 36 low)

In [49]: deck = Counter(tens=16, low=36)

In [50]: deck = list(deck.elements())

In [51]: deal = sample(deck, 20)

In [52]: Counter(deal)
Out[52]: Counter({'low': 14, 'tens': 6})

In [53]: deal = sample(deck, 52)

In [54]: remainder = deal[20:]

In [55]: Counter(remainder)
Out[55]: Counter({'low': 23, 'tens': 9})

In [56]: 

In [56]: deck = Counter(tens=16, low=36)

In [57]: deck = list(deck.elements())

In [58]: deal = sample(deck, 52)

In [59]: remainder = deal[20:]

In [60]: Counter(remainder)
Out[60]: Counter({'tens': 11, 'low': 21})

In [61]: # ^---- This is the big idea with little code

5 our more head from 7 spins of a biased coin

In [62]: pop = ['heads', 'tails']

In [63]: wgt = [6, 4]

In [64]: cumwgt = [0.60, 1.00]

In [66]: choices(['heads', 'tails'], cum_weights=[0.60, 1.00])
Out[66]: ['heads']

In [67]: choices(['heads', 'tails'], cum_weights=[0.60, 1.00], k=7)
Out[67]: ['heads', 'tails', 'heads', 'tails', 'heads', 'tails', 'heads']

In [68]: choices(['heads', 'tails'], cum_weights=[0.60, 1.00], k=7).count('heads')
Out[68]: 3

In [69]: choices(['heads', 'tails'], cum_weights=[0.60, 1.00], k=7).count('heads') >= 5
Out[69]: False

In [70]: trial = lambda: choices(['heads', 'tails'], cum_weights=[0.60, 1.00], k=7).count('heads') >= 5

In [71]: trial()
Out[71]: True

In [72]: trial()
Out[72]: True

In [73]: n = 100_000

In [74]: sum(trial() for i in range(n)) /n
Out[74]: 0.41871

Lesson 3: Improving Reliability with MyPy and Type Hinting

Type Hinting and Linting

Big Idea:

Add type hints to code helps clarify your thoughts, improve documentation and may allow a static analysis tool to detect some kind of errors

  • Use of : type
  • Use of function annotations
  • Use of class
  • Container[Type]
  • Tuple and ...
  • Optional arguments
  • Deque vs deque

Tools

  • mypy
  • pyflakes
  • hypothesis
  • unittest -> nose py.test
# hints.py
from typing import *

x: int = 10

def f(x: int, y: int) -> int:
    return x + y

print(f(10, 20))
print(f(10, 'hello'))

only unittest would have detected the error previously

(modernpython) $ python -m mypy hints.py 
hints.py:9: error: Argument 2 to "f" has incompatible type "str"; expected "int"
Found 1 error in 1 file (checked 1 source file)

Cluster Analysis

Big Idea:

K-means is an unsupervised learning tool for identifying clusters with-in datasets

Algorithm in English:

Pick arbitrary points as guesses for the center of each group. Assign all the data points to the closest matching group. Within each group, average the points to get a new guess for the center of the group. Repeat multiple times: assign data and average the points

Goal: Express the idea more clearly and beautifully in Python than in English

Topics to Prepare for Resampling

  • type hinting
  • fsum, true division
  • defaultdict grouping
  • key function work with min(), max(), sorted(), groupby(), merge()
  • zip() and star args
  • flattening with nested for-loop
  • list(iterator)

fsum, true division

In [1]: 1.1 + 2.2
Out[1]: 3.3000000000000003

In [2]: 1.1 + 2.2 == 3.3
Out[2]: False

In [3]: 1.1 + 2.2 - 3.3
Out[3]: 4.440892098500626e-16

In [4]: [0.1] * 10
Out[4]: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]

In [5]: sum([0.1] * 10)
Out[5]: 0.9999999999999999

In [6]: sum([0.1] * 10) < 1.0
Out[6]: True

In [7]: 

In [7]: from math import fsum

In [8]: fsum([0.1] * 10) == 1.0
Out[8]: True

In [9]: 38 / 5
Out[9]: 7.6

In [10]: 38 // 5
Out[10]: 7

defaultdict grouping

when the key is missing, the specified function in bracket will run empty containers

  • set()
  • list()
  • int()
  • dict()
In [13]: from collections import defaultdict

In [14]: d = {'raymond': 'red'}

In [15]: e = defaultdict(lambda: 'black')

In [18]: e['raymond'] = 'red'

In [19]: d
Out[19]: {'raymond': 'red'}

In [20]: e
Out[20]: defaultdict(<function __main__.<lambda>()>, {'raymond': 'red'})

In [21]: 

In [21]: d['rachel']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-21-08ed401c6aef> in <module>
----> 1 d['rachel']

KeyError: 'rachel'

In [22]: e['rachel']
Out[22]: 'black'

In [23]: e
Out[23]: 
defaultdict(<function __main__.<lambda>()>,
            {'raymond': 'red', 'rachel': 'black'})

In [24]: 

In [24]: s = set()

In [25]: s.add('tom')

In [26]: s
Out[26]: {'tom'}

In [27]: l = list()

In [28]: l.append('tom')

In [29]: i = int()

In [30]: i
Out[30]: 0

In [31]: i += 1

In [32]: i
Out[32]: 1

In [37]: d = defaultdict(set)

In [38]: d['s']
Out[38]: set()

In [39]: d['t'].add('tom')

In [40]: d['m'].add('mary')

In [41]: d['t'].add('tim')

In [42]: d['t'].add('tom')

In [43]: d['m'].add('martin')

In [44]: d
Out[44]: defaultdict(set, {'s': set(), 't': {'tim', 'tom'}, 'm': {'martin', 'mary'}})


In [45]: # defaultdict creates a new container to store elements with a

In [46]: # common key

In [48]: d = defaultdict(list)

In [49]: d['l']
Out[49]: []

In [50]: d['t'].append('tom')

In [51]: d['t'].append('tom')

In [52]: d
Out[52]: defaultdict(list, {'l': [], 't': ['tom', 'tom']})

In [53]: 

In [53]: names = ''' david betty susan mary darlene sandy davin
    ...:             shelly becky beatrice to michael wallce'''.split()

In [54]: names
Out[54]: 
['david',
 'betty',
 'susan',
 'mary',
 'darlene',
 'sandy',
 'davin',
 'shelly',
 'becky',
 'beatrice',
 'to',
 'michael',
 'wallce']

In [55]: 

In [55]: d = defaultdict(list)

In [56]: names[0][0]
Out[56]: 'd'

In [57]: for name in names:
    ...:     feature = name[0]
    ...:     d[feature].append(name)
    ...: 

In [58]: d
Out[58]: 
defaultdict(list,
            {'d': ['david', 'darlene', 'davin'],
             'b': ['betty', 'becky', 'beatrice'],
             's': ['susan', 'sandy', 'shelly'],
             'm': ['mary', 'michael'],
             't': ['to'],
             'w': ['wallce']})

In [59]: 

In [63]: d = defaultdict(list)

In [64]: len(names[0])
Out[64]: 5

In [65]: for name in names:
    ...:     feature = len(name)
    ...:     d[feature].append(name)
    ...: 

In [66]: d
Out[66]: 
defaultdict(list,
            {5: ['david', 'betty', 'susan', 'sandy', 'davin', 'becky'],
             4: ['mary'],
             7: ['darlene', 'michael'],
             6: ['shelly', 'wallce'],
             8: ['beatrice'],
             2: ['to']})

key function

In [67]: # SELECT name FROM names ORDER BY len(name);

In [68]: sorted(names, key=len)
Out[68]: 
['to',
 'mary',
 'david',
 'betty',
 'susan',
 'sandy',
 'davin',
 'becky',
 'shelly',
 'wallce',
 'darlene',
 'michael',
 'beatrice']

zip()

In [69]: list(zip('abcdef', 'ghijklm'))
Out[69]: [('a', 'g'), ('b', 'h'), ('c', 'i'), ('d', 'j'), ('e', 'k'), ('f', 'l')]

In [70]: 

In [70]: from itertools import zip_longest

In [71]: list(zip_longest('abcdef', 'ghijklm'))
Out[71]: 
[('a', 'g'),
 ('b', 'h'),
 ('c', 'i'),
 ('d', 'j'),
 ('e', 'k'),
 ('f', 'l'),
 (None, 'm')]

In [72]: list(zip_longest('abcdef', 'ghijklm', fillvalue='default'))
Out[72]: 
[('a', 'g'),
 ('b', 'h'),
 ('c', 'i'),
 ('d', 'j'),
 ('e', 'k'),
 ('f', 'l'),
 ('default', 'm')]

In [74]: # 3 rows by 2 column

In [75]: m = [
    ...:         [10, 20],
    ...:         [30, 40],
    ...:         [50, 60],
    ...:     ]

In [76]: # swap rows and columns

In [77]: list(zip([10, 20], [30, 40], [50, 60]))
Out[77]: [(10, 30, 50), (20, 40, 60)]

In [78]: 

In [78]: list(zip(*m))
Out[78]: [(10, 30, 50), (20, 40, 60)]

flattening 2D array with nested for-loop

In [79]: m
Out[79]: [[10, 20], [30, 40], [50, 60]]

In [80]: for row in m:
    ...:     print(row)
    ...: 
[10, 20]
[30, 40]
[50, 60]

In [81]: for row in m:
    ...:     for col in row:
    ...:         print(col)
    ...: 
10
20
30
40
50
60

In [82]: [x for row in m for x in row]
Out[82]: [10, 20, 30, 40, 50, 60]

list(iterator)

if need to index the iterator or loop over them

In [83]: it = iter('abcd')

In [84]: it
Out[84]: <str_iterator at 0x7f125cb00cd0>

In [85]: list(it)
Out[85]: ['a', 'b', 'c', 'd']

Lesson 5: Building Additional Skills for Data Analysis

Cluster Analysis of Voting Blocks

Big Idea:

Analyze public records to identify congressional voting blocks

Preparation for Cluster Analysis of Voting Blocks

  • defaultdict for accumulating data (tabulating)
  • defaultdict for reversing a one-to-many mapping
  • glob
  • reading files with an encoding
  • using next() or islice() to remove elements from an iterator
  • csv.reader
  • tuple unpacking
  • lopping idioms: enumerate, zip, reversed, sorted, set
  • incrementing instances of Counter
  • assertions

defaultdict for accumulating data (tabulating)

In [86]: from collections import defaultdict

In [87]: d = defaultdict(list)

In [88]: d['raymond'].append('red')

In [89]: d['rachel'].append('blue')

In [90]: d['matthew'].append('yellow')

In [91]: d
Out[91]: 
defaultdict(list,
            {'raymond': ['red'], 'rachel': ['blue'], 'matthew': ['yellow']})

In [92]: from pprint import pprint

In [93]: pprint(d)
defaultdict(<class 'list'>,
            {'matthew': ['yellow'],
             'rachel': ['blue'],
             'raymond': ['red']})

In [94]: d['raymond'].append('mac')

In [95]: d['rachel'].append('pc')

In [96]: d['matthew'].append('vtec')

In [97]: pprint(d)
defaultdict(<class 'list'>,
            {'matthew': ['yellow', 'vtec'],
             'rachel': ['blue', 'pc'],
             'raymond': ['red', 'mac']})

In [98]: pprint(dict(d))
{'matthew': ['yellow', 'vtec'],
 'rachel': ['blue', 'pc'],
 'raymond': ['red', 'mac']}

defaultdict for reversing a one-to-many mapping

In [99]: # defaultdict: grouping, accumalation

In [100]: # Model one-to-many: dict(one, list_of_many)

In [101]: e2s = {
     ...:     'one': ['uno'],
     ...:     'two': ['dos'],
     ...:     'three': ['tres'],
     ...:     'trio': ['tres'],
     ...:     'free': ['libre', 'gratis'],
     ...: }

In [102]: pprint(e2s, width=40)
{'free': ['libre', 'gratis'],
 'one': ['uno'],
 'three': ['tres'],
 'trio': ['tres'],
 'two': ['dos']}

In [103]: s2e = defaultdict(list)

In [104]: for eng, spanwords in e2s.items():
     ...:     for span in spanwords:
     ...:         s2e[span].append(eng)
     ...: 

In [105]: pprint(s2e)
defaultdict(<class 'list'>,
            {'dos': ['two'],
             'gratis': ['free'],
             'libre': ['free'],
             'tres': ['three', 'trio'],
             'uno': ['one']})

In [106]: 

In [106]: # each word has a single translations

In [107]: e2s = dict(one='uno', two='dos', three='tres')

In [108]: {span: eng for end, span in e2s.items()}
Out[108]: {'uno': 'free', 'dos': 'free', 'tres': 'free'}

glob

In [109]: import glob

In [111]: glob.glob('*.txt')
Out[111]: []

reading files with an encoding

when there is unicode character in the file

In [112]: with open('data.csv', encoding='utf-8') as f:
     ...:     print(f.read())
     ...: 

using next() or islice() to remove elements from an iterator

In [113]: it = iter('abcdefg')

In [114]: it
Out[114]: <str_iterator at 0x7f124f5f8460>

In [115]: next(it)
Out[115]: 'a'

In [116]: next(it)
Out[116]: 'b'

In [117]: list(it)
Out[117]: ['c', 'd', 'e', 'f', 'g']

tuple unpacking

In [118]: t = ('Raymond', 'Hettinger', 54, '[email protected]')

In [119]: type(t)
Out[119]: tuple

In [120]: len(t)
Out[120]: 4

In [121]: fname, lname, age, email = t

In [122]: fname
Out[122]: 'Raymond'

In [123]: lname
Out[123]: 'Hettinger'

lopping idioms: enumerate, zip, reversed, sorted, set

In [124]: names = 'raymond rachael matthew'.split()

In [125]: colors = 'red blue yellow'.split()

In [126]: cities = 'austin dallas austin houston chicago dallas austin'.split()

In [128]: # Loop idioms

In [129]: for i in range(len(names)):
     ...:     print(names[i].upper())
     ...: 
RAYMOND
RACHAEL
MATTHEW

In [130]: for name in names:
     ...:     print(name.upper())
     ...: 
RAYMOND
RACHAEL
MATTHEW

In [132]: for i in range(len(names)):
     ...:     print(i+1, names[i])
     ...: 
1 raymond
2 rachael
3 matthew

In [133]: for i, name in enumerate(names, start=1):
     ...:     print(i, name)
     ...: 
1 raymond
2 rachael
3 matthew

In [134]: colors
Out[134]: ['red', 'blue', 'yellow']

In [135]: for i in range(len(colors) -1, -1, -1):
     ...:     print(colors[i])
     ...: 
yellow
blue
red

In [136]: for color in reversed(colors):
     ...:     print(color)
     ...: 
yellow
blue
red

In [137]: names
Out[137]: ['raymond', 'rachael', 'matthew']

In [138]: colors
Out[138]: ['red', 'blue', 'yellow']

In [139]: n = min(len(names), len(colors))

In [140]: for i in range(n):
     ...:     print(names[i], colors[i])
     ...: 
raymond red
rachael blue
matthew yellow

In [141]: for name, color in zip(names, colors):
     ...:     print(name, color)
     ...: 
raymond red
rachael blue
matthew yellow

In [142]: 

In [142]: 

In [142]: colors
Out[142]: ['red', 'blue', 'yellow']

In [143]: for color in sorted(colors):
     ...:     print(color)
     ...: 
blue
red
yellow

In [144]: for color in sorted(colors, key=len):
     ...:     print(color)
     ...: 
red
blue
yellow

In [148]: # SELECT DISTINCT city FROM Cities ORDER BY city;

In [149]: # ORDER BY = sorted, DISTINCT = set

In [151]: for city in reversed(sorted(set(cities))):
     ...:     print(city)
     ...: 
     ...: 
houston
dallas
chicago
austin

In [152]: for i, city in enumerate(reversed(sorted(set(cities)))):
     ...:     print(i, city)
     ...: 
     ...: 
     ...: 
0 houston
1 dallas
2 chicago
3 austin

In [153]: for i, city in enumerate(map(str.upper, reversed(sorted(set(cities))))):
     ...:     print(i, city)
     ...: 
     ...: 
     ...: 
0 HOUSTON
1 DALLAS
2 CHICAGO
3 AUSTIN

incrementing instances of Counter

In [154]: from collections import Counter

In [155]: c = Counter()

In [156]: c['red'] += 1

In [157]: c
Out[157]: Counter({'red': 1})

In [158]: c['blue'] += 1

In [159]: c['red'] += 1

In [160]: c
Out[160]: Counter({'red': 2, 'blue': 1})

In [161]: 

In [161]: c.most_common(1)
Out[161]: [('red', 2)]

In [162]: c.most_common(2)
Out[162]: [('red', 2), ('blue', 1)]

In [163]: c.most_common()
Out[163]: [('red', 2), ('blue', 1)]

In [164]: list(c.elements())
Out[164]: ['red', 'red', 'blue']

assertions

In [165]: assert 5 + 3 == 8

In [166]: assert 5 + 3 == 10
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-166-f9d98b79d5e0> in <module>
----> 1 assert 5 + 3 == 10

AssertionError: 

Lesson 7: Gearing-up for a Publisher/Subscriber Application

Publisher/Subsriber Service

Big Idea:

Users make posts. Followers subscribe to the posts they are interested in. Newer posts are more relevant. Display posts by a user, posts for a user. Display those followed by a user. Store the user account information with hashed passwords

Tools We Will Need

  • Unicode normalization, NFC: chr(111)+chr(776) -> chr(246)
  • Named tuples
  • sorted(), bisect(), and merge() -- reverse and key arguments
  • itertools.islice()
  • sys.intern()
  • random.expovariate()
  • time.sleep() and time.time()
  • hashlib: pbkdf2_hmac, sha256/512, digest, hexdigest
  • repr of a tuple
  • joining strings
  • floor divisions
  • ternary operator
  • and/or short-circuit boolean operations that return a value

Named tuples

In [1]: from collections import namedtuple

In [2]: Person = namedtuple('Person', ['fname', 'lname', 'age', 'email'])

In [3]: p = Person('Raymond', 'Hettinger', 54, '[email protected]')

In [4]: isinstance(p, tuple)
Out[4]: True

In [5]: len(p)
Out[5]: 4

In [6]: a, b, c, d = p

In [7]: p[:2]
Out[7]: ('Raymond', 'Hettinger')

In [8]: p[0]
Out[8]: 'Raymond'

In [9]: p
Out[9]: Person(fname='Raymond', lname='Hettinger', age=54, email='[email protected]')

In [10]: p.lname
Out[10]: 'Hettinger'

sorted(), bisect(), and merge()

bisect is for searching ranges

In [11]: import bisect

In [12]: cuts = [60, 70, 80, 90]

In [13]: grades = 'FDCBA'

In [14]: grades[bisect.bisect(cuts, 76)]
Out[14]: 'C'

In [15]: [grades[bisect.bisect(cuts, score)] for score in [76, 92, 80, 70, 69, 91, 99, 100]]
Out[15]: ['C', 'A', 'B', 'C', 'D', 'A', 'A', 'A']

In [16]: 

In [16]: 

In [16]: sorted([10, 5, 20])
Out[16]: [5, 10, 20]

In [17]: sorted([10, 5, 20] + [1, 11, 25])
Out[17]: [1, 5, 10, 11, 20, 25]

In [18]: 

In [18]: a = [1, 11, 25]

In [19]: b = [5, 10, 20]

In [20]: c = [2, 15, 21]

In [21]: sorted(a + b + c)
Out[21]: [1, 2, 5, 10, 11, 15, 20, 21, 25]

In [22]: 

In [22]: from heapq import merge

In [23]: list(merge(a, b, c))
Out[23]: [1, 2, 5, 10, 11, 15, 20, 21, 25]

In [24]: it = merge(a, b, c)

In [25]: it
Out[25]: <generator object merge at 0x7f4c86362f20>

In [26]: next(it)
Out[26]: 1

In [27]: next(it)
Out[27]: 2

itertools.islice()

generator is code that run on demand

query on a search engine and it has 200_000 search queries, get a iterator over the search query and use islice to display the first 10 results and on the next page will display next 10 results

In [29]: from itertools import islice

In [30]: list(islice('abcdefghi', 3))
Out[30]: ['a', 'b', 'c']

In [31]: islice('abcdefghi', 3)
Out[31]: <itertools.islice at 0x7f4c85ebf630>

In [32]: 'abcdefghi'[:3]
Out[32]: 'abc'

In [34]: list(islice('abcdefghi', None, 3))
Out[34]: ['a', 'b', 'c']

In [35]: list(islice('abcdefghi', 2, 4))
Out[35]: ['c', 'd']

In [36]: 'abcdefghi'[2:4]
Out[36]: 'cd'

In [37]: list(islice('abcdefghi', 0, 4, 2))
Out[37]: ['a', 'c']

In [38]: 'abcdefghi'[0:4:2]
Out[38]: 'ac'

In [39]: it = merge(a, b, c)

In [40]: it
Out[40]: <generator object merge at 0x7f4c8618a4a0>

In [41]: list(islice(it, 3))
Out[41]: [1, 2, 5]

sys.intern()

save memory on the same object

In [42]: s = 'he'

In [43]: t = 'llo'

In [44]: u = 'hello'

In [45]: v = s + t

In [46]: 

In [46]: u
Out[46]: 'hello'

In [47]: v
Out[47]: 'hello'

In [48]: u == v
Out[48]: True

In [49]: id(u)
Out[49]: 139966649645232

In [50]: id(v)
Out[50]: 139966646070704

In [51]: 

In [51]: import sys

In [52]: u = sys.intern('hello')

In [53]: v = sys.intern(s + t)

In [54]: u
Out[54]: 'hello'

In [55]: v
Out[55]: 'hello'

In [56]: u is v
Out[56]: True

In [57]: id(u)
Out[57]: 139966649645232

In [58]: id(v)
Out[58]: 139966649645232

random.expovariate()

expovariate is used to simulate the arrival time of users or customers into a service

In [59]: import random

In [60]: random.uniform(1000, 1100)
Out[60]: 1085.475630656087

In [61]: random.triangular(1000, 1100)
Out[61]: 1047.8443081144524

In [62]: random.expovariate(1 / 5)
Out[62]: 5.90806614076803

time.sleep() and time.time()

In [80]: import time

In [81]: x = 10; print(x ** 2)
100

In [82]: time.sleep(5); print('Done')

Done

In [83]: 

In [83]: time.time()
Out[83]: 1624636251.2973113

In [84]: time.ctime()
Out[84]: 'Fri Jun 25 23:50:55 2021'

hashlib

In [85]: import hashlib

In [86]: hashlib.md5('The tale of two cities'.encode('utf-8'))
Out[86]: <md5 HASH object @ 0x7f4c86486030>

In [87]: hashlib.md5('The tale of two cities'.encode('utf-8')).digest()
Out[87]: b'\x83S\xb0,<\xd3u\xba\x8d\xa2-\xdd~O"\xfa'

In [89]: hashlib.md5('The tale of two cities'.encode('utf-8')).hexdigest()
Out[89]: '8353b02c3cd375ba8da22ddd7e4f22fa'

In [90]: 

In [90]: hashlib.sha1('The tale of two cities'.encode('utf-8'))
Out[90]: <sha1 HASH object @ 0x7f4c8608a110>

In [91]: hashlib.sha1('The tale of two cities'.encode('utf-8')).hexdigest()
Out[91]: '40d7238a320003ef2f1ab881741792d3735427e6'

In [92]: hashlib.sha256('The tale of two cities'.encode('utf-8')).hexdigest()
Out[92]: 'b37e58d6cbc67229c3184eeb249899ad0fa0164a27c33b79fffdd14173ab9812'

In [93]: hashlib.sha512('The tale of two cities'.encode('utf-8')).hexdigest()
Out[93]: 'bbd0129997233fc6aec15b969c98d84fee6573281009caed5870bf97c9a1249b7a5366ba76dfed2578bb881ffef962fdc3fad31359dccceadd323746badd52c9'

In [94]: 

In [94]: b = 'The tale of two cities'.encode('utf-8')

In [95]: b = hashlib.sha512(b).digest()

In [96]: b = hashlib.sha512(b).digest()

In [97]: b = hashlib.sha512(b).digest()

In [98]: b = hashlib.sha512(b).digest()

In [99]: 

In [94]: b = 'The tale of two cities'.encode('utf-8')

In [95]: b = hashlib.sha512(b).digest()

In [96]: b = hashlib.sha512(b).digest()

In [97]: b = hashlib.sha512(b).digest()

In [98]: b = hashlib.sha512(b).digest()

In [99]: 

In [99]: p = 'The tale of two cities'.encode('utf-8')

In [100]: h = hashlib.pbkdf2_hmac('sha256', p, b'some phrase', 100_000)

In [101]: h
Out[101]: b'\xaa\x18\x04\x05\xc8`\xc1\xdf\x11Q\xca\x018\\\xce\xf8\xc8\x9d9\xea`\xd0O\xc0\x8a98&\xeb\xca)\xaf'

In [102]: h = hashlib.pbkdf2_hmac('sha256', p, b'some other phrase', 100_000)

In [103]: h
Out[103]: b'\r\xa9\xf4\x8f\x8d\xcd\x1f\xcb+\xcc\xb1\xf2P\xf1\x17v\x8dmx/0\x1bH\x0c\xbb\xb9\xac\xcd\xfa\xd4\xe4n'

repr of a tuple

In [104]: s = 'the quick '

In [105]: t = 'brown fox'

In [106]: s + t
Out[106]: 'the quick brown fox'

In [107]: 

In [107]: s = 'the quick brown'

In [110]: t = ' fox'

In [111]: s + t
Out[111]: 'the quick brown fox'

In [115]: s = 'the quick '

In [116]: t = 'brown fox'

In [117]: repr((s, t))
Out[117]: "('the quick ', 'brown fox')"

In [118]: 

In [118]: s = 'the quick brown'

In [119]: t = ' fox'

In [120]: repr((s, t))
Out[120]: "('the quick brown', ' fox')"

joining strings

join is the opposite of split

In [121]: los = ['raymond', 'hettinger', 'likes', 'python']

In [122]: ' '.join(los)
Out[122]: 'raymond hettinger likes python'

In [123]: ''.join(los)
Out[123]: 'raymondhettingerlikespython'

floor divisions

In [124]: 38 / 5
Out[124]: 7.6

In [125]: 38 // 5
Out[125]: 7

ternary operator

In [126]: # Ternary operator == Conditional Expression

In [127]: score = 70

In [128]: 'pass' if score >= 70 else 'fail'
Out[128]: 'pass'

In [129]: score = 69

In [130]: 'pass' if score >= 70 else 'fail'
Out[130]: 'fail'

and/or short-circuit boolean

False and True, will not check the second True

python also returns the value that made the expression True or False

In [135]: True and True
Out[135]: True

In [136]: True and False
Out[136]: False

In [137]: False and True
Out[137]: False

In [138]: 

In [138]: 3 < 10 and 10 < 20
Out[138]: True

In [139]: bool('hello')
Out[139]: True

In [140]: len('hello')
Out[140]: 5

In [141]: 

In [141]: 'hello' and True
Out[141]: True

In [142]: True and 'hello'
Out[142]: 'hello'

In [143]: 

In [143]: bool('')
Out[143]: False

In [144]: len('')
Out[144]: 0

In [145]: 

In [145]: '' and 'hello'
Out[145]: ''

In [146]: def f(x, s=None):
     ...:     s = s or 'default'
     ...:     print(x, s)
     ...: 

In [147]: f(10, 'some value')
10 some value

In [148]: f(10)
10 default

deque

used when inserting or deleting in front of the list

deque.appendleft(value) beats list.insert(0, value)

If there is an s.insert(0, x) or s.pop(0) lurking in your code, consider using deque.appendleft() or deque.popleft() instead

In [150]: from collections import deque

In [151]: names = deque(['raymond', 'rachel', 'matthew', 'roger',
     ...:                'betty', 'melissa', 'judith', 'charlie'])

In [152]: names
Out[152]:
deque(['raymond',
       'rachel',
       'matthew',
       'roger',
       'betty',
       'melissa',   
       'judith',
       'charlie'])

In [153]: names.popleft()
Out[153]: 'raymond'

In [154]: names.appendleft('mark')

In [155]: names
Out[155]:
deque(['mark',
       'rachel',
       'matthew',
       'roger',
       'betty',
       'melissa',
       'judith',
       'charlie'])

Lesson 8: Implementing a Publisher/Subscriber Application

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment