Skip to content

Instantly share code, notes, and snippets.

@shotahorii
Last active January 2, 2016 16:29
Show Gist options
  • Save shotahorii/8330302 to your computer and use it in GitHub Desktop.
Save shotahorii/8330302 to your computer and use it in GitHub Desktop.
[statistics] Frequency Distribution Table

Create Frequency Distribution Table

  • frequency distribution
  • comulaive frequency
  • relative frequency
  • comulative relative frequency
  • sturges' formula

In order to deal with large data, tail recursive optimisation is needed.
I just used a decorator class, written by George Sakkis here.
-> rec.py

#rec.py is a decorator class written by George Sakkis. See(http://code.activestate.com/recipes/496691/)
import math,functools
from rec import tail_recursive
def _count(l, test):
@tail_recursive
def helper(l, test, count):
if not l: return count
if test(l[0]): count += 1
return helper(l[1:],test, count)
return helper(l, test, 0)
def freq(l, binWidth, lim=0):
#the number of classes and test condition of the last class change if a limit is given.
if lim == 0:
num_of_classes = math.ceil(max(l)/binWidth)
last_test = lambda x: True if binWidth*(num_of_classes-1) <= x <= binWidth*num_of_classes else False
else:
num_of_classes = lim//binWidth + 1
last_test = lambda x: True if binWidth*(lim//binWidth) <= x else False
#bottom:inclusive, top:exclusive.
test = lambda bottom, top, x: True if bottom <= x < top else False
result = [_count(l, functools.partial(test, binWidth*i, binWidth*(i+1))) for i in range(num_of_classes-1)]
result.append(_count(l, last_test))
return result
#freqList: a list generated by freq.
def comulativeFreq(freqList):
return [sum(freqList[:i]) for i in range(1, len(freqList)+1)]
#freqList: a list generated by freq.
def relativeFreq(freqList):
total = sum(freqList)
return [freqList[i]/total for i in range(len(freqList))]
#freqList: a list generated by freq.
def comulativeRelativeFreq(freqList):
rel = relativeFreq(freqList)
return [sum(rel[:i]) for i in range(1, len(rel)+1)]
#n: the number of samples. When the data is a list l, n = len(l)
def sturges(n):
return 1 + math.log(n,2)
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
8
7
17
23
21
28
24
27
25
28
38
34
37
40
35
39
38
33
34
32
50
44
43
44
42
46
46
47
46
45
44
46
43
47
46
51
53
53
57
56
56
55
55
53
53
54
51
52
58
55
54
60
53
54
53
57
57
59
66
63
65
65
67
68
65
64
69
64
65
63
69
65
67
68
62
64
61
80
75
77
73
79
72
79
78
76
79
72
72
72
75
84
90
82
82
83
84
97
93
100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment