Skip to content

Instantly share code, notes, and snippets.

@cheecheeo
Created June 6, 2014 03:01
Show Gist options
  • Save cheecheeo/779a07161bd0183610c0 to your computer and use it in GitHub Desktop.
Save cheecheeo/779a07161bd0183610c0 to your computer and use it in GitHub Desktop.
module NGrams where
import Data.Char
import Data.List
-- | Given a String, compute all the n letters n-grams of its words (excluding all non alphanums characters).
-- Word separations matter.
--
-- Example:
--
-- >>> lettersNGrams 3 "Hello, world!"
-- ["Hel","ell","llo","wor","orl","rld"]
--
-- >>> lettersNGrams 3 "Hello,world!"
-- ["Hel","ell","llo","wor","orl","rld"]
--
-- >>> lettersNGrams 10 "Hello, world!"
-- ["Hello","world"]
lettersNGrams :: Int -> String -> [String]
lettersNGrams n s = concatMap (nGrams n) (words (map (\c -> if isAlpha c then c else ' ') s))
-- Helper n-gram functions from yesterday
nGrams :: Int -> [a] -> [[a]]
nGrams n = unfoldr go
where go [] = Nothing
go xs = let
(ng, remain) = splitAt n xs
next = take 1 remain >> drop 1 xs -- if (null $ take 1 remain) then [] else drop 1 xs
in Just (ng, next)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment