Skip to content

Instantly share code, notes, and snippets.

@jdeisenberg
Last active May 8, 2017 14:14
Show Gist options
  • Save jdeisenberg/2152eecb2526e8a1b751e92ed46c1bd5 to your computer and use it in GitHub Desktop.
Save jdeisenberg/2152eecb2526e8a1b751e92ed46c1bd5 to your computer and use it in GitHub Desktop.
Find frequency and cumulative frequency of first letter of last name initials
import System.IO
import Numeric
import qualified Data.Map as M
-- Given a file with people's first and last initials,
-- one person per line, find the frequency and cumulative
-- frequency for each letter.
freqCount:: M.Map Char Int -> String -> M.Map Char Int
freqCount m s =
let v = M.lookup (last s) m in
case v of
Just a -> M.insert (last s) (a + 1) m
Nothing -> M.insert (last s) 1 m
-- utility routine to show a float to
-- three decimal places
threeDec :: Float -> String
threeDec x = showFFloat (Just 3) x ""
lastnameDistr:: String -> String
lastnameDistr inputString =
show outlist
where
items = lines inputString
n = length items
outmap = foldl freqCount M.empty items
outlist = M.foldlWithKey (\ acc key val ->
((fst acc) ++ [(key, val, threeDec (fromIntegral val / fromIntegral n),
threeDec (fromIntegral (val + snd acc) / fromIntegral n))],
(val + snd acc))) ([], 0) outmap
main = do
s <- readFile "initials.txt"
putStrLn (lastnameDistr s)
AR
ES
NH
DK
SG
ED
AA
AB
CD
DG
AN
ET
AB
CC
KO
KH
VJ
JV
RL
ER
KO
JA
MJ
KW
GM
JC
JM
CJ
CF
ZH
BF
AC
EC
TH
AK
@jdeisenberg
Copy link
Author

This program finds the frequency (and cumulative frequency) of first letters of last names. Here’s why it was written: We were doing registration of students at an event by first letter of their last name and had three registration stations labeled A-H, I-P, and R-Z. THe distribution was very uneven; R-Z was almost empty while A-H had a long line of people waiting. For the next group of people the next day, we opened four stations, but we wanted to know where to split the alphabet to even out the number of people at each registration station. I had registration cards from the previous year, so I entered some 750 data points (first and last name initials -- I was interested in seeing what the distribution for first names was) and wrote this program to get the cumulative frequencies. Having the extra station with equal frequencies made registration go much more smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment