Last active
May 8, 2017 14:14
-
-
Save jdeisenberg/2152eecb2526e8a1b751e92ed46c1bd5 to your computer and use it in GitHub Desktop.
Find frequency and cumulative frequency of first letter of last name initials
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import System.IO | |
import Numeric | |
import qualified Data.Map as M | |
-- Given a file with people's first and last initials, | |
-- one person per line, find the frequency and cumulative | |
-- frequency for each letter. | |
freqCount:: M.Map Char Int -> String -> M.Map Char Int | |
freqCount m s = | |
let v = M.lookup (last s) m in | |
case v of | |
Just a -> M.insert (last s) (a + 1) m | |
Nothing -> M.insert (last s) 1 m | |
-- utility routine to show a float to | |
-- three decimal places | |
threeDec :: Float -> String | |
threeDec x = showFFloat (Just 3) x "" | |
lastnameDistr:: String -> String | |
lastnameDistr inputString = | |
show outlist | |
where | |
items = lines inputString | |
n = length items | |
outmap = foldl freqCount M.empty items | |
outlist = M.foldlWithKey (\ acc key val -> | |
((fst acc) ++ [(key, val, threeDec (fromIntegral val / fromIntegral n), | |
threeDec (fromIntegral (val + snd acc) / fromIntegral n))], | |
(val + snd acc))) ([], 0) outmap | |
main = do | |
s <- readFile "initials.txt" | |
putStrLn (lastnameDistr s) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AR | |
ES | |
NH | |
DK | |
SG | |
ED | |
AA | |
AB | |
CD | |
DG | |
AN | |
ET | |
AB | |
CC | |
KO | |
KH | |
VJ | |
JV | |
RL | |
ER | |
KO | |
JA | |
MJ | |
KW | |
GM | |
JC | |
JM | |
CJ | |
CF | |
ZH | |
BF | |
AC | |
EC | |
TH | |
AK |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This program finds the frequency (and cumulative frequency) of first letters of last names. Here’s why it was written: We were doing registration of students at an event by first letter of their last name and had three registration stations labeled A-H, I-P, and R-Z. THe distribution was very uneven; R-Z was almost empty while A-H had a long line of people waiting. For the next group of people the next day, we opened four stations, but we wanted to know where to split the alphabet to even out the number of people at each registration station. I had registration cards from the previous year, so I entered some 750 data points (first and last name initials -- I was interested in seeing what the distribution for first names was) and wrote this program to get the cumulative frequencies. Having the extra station with equal frequencies made registration go much more smoothly.