Skip to content

Instantly share code, notes, and snippets.

@bgerstle
Last active August 29, 2015 14:19
Show Gist options
  • Save bgerstle/462e8084cea17cdc0139 to your computer and use it in GitHub Desktop.
Save bgerstle/462e8084cea17cdc0139 to your computer and use it in GitHub Desktop.
Haskell CSV histogram printer
import qualified Data.Text as Text
import qualified Data.Text.IO
import qualified Data.Map as Map
import Data.List
import Data.Ord
-- Restrict histogram values to Int (shouldn't be too big and removes default constraint warnings)
type TextHistogramValue = Int
type TextHistogramKey = Text.Text
type TextHistogramEntry = (TextHistogramKey, TextHistogramValue)
type TextHistogram = Map.Map TextHistogramKey TextHistogramValue
-- Reduce a list of Text into a histogram of the occurences of each Text element
countOccurrences :: [Text.Text] -> TextHistogram
countOccurrences = foldr (\t m -> Map.insertWith (+) t 1 m) Map.empty
-- Take CSV rows (i.e. copy/pasted from Google Docs), join lines w/ ',', split all on ',' and strip whitespace
joinAndCleanCSVLines :: Text.Text -> Text.Text -> [Text.Text]
joinAndCleanCSVLines d t = map Text.strip $ Text.splitOn d $ (Text.intercalate d . Text.lines) t
-- Transform TextHistogramEntry into a bullet point
toBulletedListItem :: TextHistogramEntry -> String
toBulletedListItem (t, n) = "- " ++ Text.unpack t ++ ": " ++ show n
-- CSV Text delimiter
delim :: Text.Text
delim = Text.pack ","
-- Read all lines at once, separate into a single CSV row, then convert into bulleted histogram
main :: IO ()
main = do
csv <- Data.Text.IO.getContents
let topicHistogram = countOccurrences $ joinAndCleanCSVLines delim csv
sortedTopicCount = sortBy (flip (comparing snd)) $ Map.toList topicHistogram
putStr $ unlines $ map toBulletedListItem sortedTopicCount
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment