Basic libraries

That's enough, Linus.

Basic libraries and data structures

Goal: Reduce CPU and memory

Data structure choice can have a big impact on resource usage.

Text mentions that things aren't getting faster, but it's also good to mention that things are getting smaller. Minimizing CPU use means less power and better battery life.

Benchmarking with Criterion

You can't reduce something unless you know the quantity you are starting with.

Criterion http://www.serpentine.com/criterion/

module Main where

import Criterion.Main

infixl 9 !?
_      !? n | n < 0 = Nothing
[]     !? _         = Nothing
(x:_)  !? 0         = Just x
(_:xs) !? n         = xs !? (n-1)

myList :: [Int]
myList = [1..9999]

main :: IO ()
main = defaultMain
  [ bench "index list 9999"
    $ whnf (myList !!) 9998
  , bench "index list maybe index 9999"
    $ whnf (myList !?) 9998
  ]

The output after being compiled with stack ghc -- -O2 src/Main.hs

benchmarking index list 9999
time                 20.64 μs   (20.16 μs .. 21.22 μs)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 21.00 μs   (20.59 μs .. 21.71 μs)
std dev              1.881 μs   (1.408 μs .. 2.574 μs)
variance introduced by outliers: 82% (severely inflated)

benchmarking index list maybe index 9999
time                 102.7 μs   (101.2 μs .. 104.2 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 101.5 μs   (100.5 μs .. 103.3 μs)
std dev              4.372 μs   (2.832 μs .. 7.272 μs)
variance introduced by outliers: 44% (moderately inflated)

Criterion.Main provides whnf and nf functions.

Weak head normal form, as we said before, evaluates to the first data constructor. That means that if your outermost data constructor is a Maybe, it’s only going to evaluate enough to find out if it’s a Nothing or a Just.

It is important to choose the form function that will exercise your code in the most appropriate way.

The types

defaultMain :: [Benchmark] -> IO ()
whnf :: (a -> b) -> a -> Benchmarkable 
nf :: Control.DeepSeq.NFData b => (a -> b) -> a -> Benchmarkable

Note that using nf with a custom type requires defining an NFData instance.

How can we match the speed of the non-Maybe version.

The !! source can guide us. https://hackage.haskell.org/package/base-4.10.1.0/docs/src/GHC.List.html#%21%21

(!!)                    :: [a] -> Int -> a
#ifdef USE_REPORT_PRELUDE
xs     !! n | n < 0 =  errorWithoutStackTrace "Prelude.!!: negative index"
[]     !! _         =  errorWithoutStackTrace "Prelude.!!: index too large"
(x:_)  !! 0         =  x
(_:xs) !! n         =  xs !! (n-1)
#else

-- We don't really want the errors to inline with (!!).
-- We may want to fuss around a bit with NOINLINE, and
-- if so we should be careful not to trip up known-bottom
-- optimizations.
tooLarge :: Int -> a
tooLarge _ = errorWithoutStackTrace (prel_list_str ++ "!!: index too large")

negIndex :: a
negIndex = errorWithoutStackTrace $ prel_list_str ++ "!!: negative index"

{-# INLINABLE (!!) #-}
xs !! n
  | n < 0     = negIndex
  | otherwise = foldr (\x r k -> case k of
                                   0 -> x
                                   _ -> r (k-1)) tooLarge xs n
#endif

Rewriting our implementation with a foldr

infixl 9 !?
{-# INLINABLE (!?) #-}
xs !? n
  | n < 0 = Nothing
  | otherwise =
      foldr
      (\x r k ->
        case k of
          0 -> Just x
          _ -> r (k-1))
      (const Nothing) xs n

Is basically the same

benchmarking index list 9999
time                 20.11 μs   (19.75 μs .. 20.54 μs)
                     0.996 R²   (0.993 R² .. 0.999 R²)
mean                 20.45 μs   (20.10 μs .. 20.97 μs)
std dev              1.351 μs   (959.3 ns .. 1.967 μs)
variance introduced by outliers: 71% (severely inflated)

benchmarking index list maybe index 9999
time                 100.9 μs   (100.1 μs .. 101.8 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 101.1 μs   (100.4 μs .. 102.3 μs)
std dev              2.935 μs   (2.138 μs .. 3.821 μs)
variance introduced by outliers: 26% (moderately inflated)

Adding a type signature removes replaces an ambiguous Num with a concrete Int which results in the compiler producing a more "more primative" and faster loop.

benchmarking index list 9999
time                 20.13 μs   (19.87 μs .. 20.40 μs)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 20.25 μs   (20.01 μs .. 20.53 μs)
std dev              912.2 ns   (712.6 ns .. 1.194 μs)
variance introduced by outliers: 53% (severely inflated)

benchmarking index list maybe index 9999
time                 18.47 μs   (18.23 μs .. 18.78 μs)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 18.73 μs   (18.41 μs .. 19.30 μs)
std dev              1.334 μs   (776.4 ns .. 2.073 μs)
variance introduced by outliers: 74% (severely inflated)

More on deciding between whnf and nf. In a hypothetical query records from a DB and write them to a file example, whnf is appropriate when you are interested in measuring the query and nf if you also want to capture the file output time.

The book walks through a nf bottom failure example but I could not replicate it.

Profiling your programs

GHC has built in profiling of time spent in functions and memory.

module Main where

f :: IO ()
f = do
  print ([1..] !! 999999)
  putStrLn "f"

g :: IO ()
g = do
  print ([1..] !! 9999999)
  putStrLn "g"

main :: IO ()
main = do
  f
  g

stack ghc -- -prof -fprof-auto -rtsopts -O2 src/profilingTime.hs
./profilingTime +RTS -P
cat profile.prof

module Main where

import Control.Monad

blah :: [Integer]
blah = [1..1000]

main :: IO ()
main =
  replicateM_ 10000 (print blah)

stack ghc -- -prof -fprof-auto -rtsopts -O2 src/heap.hs
./src/heap +RTS -hc -p
stack exec hp2ps heap.hp

Constant applicative forms

CAFs are expressions that have no free variables and are held in memory to be shared with all other expressions in a module

module Main where

incdInts :: [Integer]
incdInts = map (+1) [1..]

main :: IO ()
main = do
  print (incdInts !! 1000)
  print (incdInts !! 9001)
  print (incdInts !! 90010)
  print (incdInts !! 9001000)
  print (incdInts !! 9501000)
  print (incdInts !! 9901000)

stack ghc -- -prof -fprof-auto -rtsopts -O2 src/caf.hs
./src/caf +RTS -P
cat caf.prof

Performance in exchange for heap. They give us a tip to look for CAFs if our programs are consuming a lot of memory.

We can remove the CAF

module Main where

-- not a CAF
incdInts :: [Integer] -> [Integer]
incdInts x = map (+1) x

main :: IO ()
main = do
  print (incdInts [1..] !! 1000)

stack ghc -- -prof -fprof-auto -rtsopts -O2 src/caffree.hs
./src/caffree +RTS -P
cat caffree.prof

Map

The datatype

data Map k a = Bin
    {-# UNPACK #-}
!Size !k a
!(Map k a) !(Map k a) | Tip
type Size = Int

Compare using a map and a list

module Main where

import Criterion.Main
import qualified Data.Map as M

genList :: Int -> [(String, Int)]
genList n = go n []
  where go 0 xs = ("0", 0) : xs
        go n' xs =
          go (n' - 1) ((show n', n') : xs)

pairList :: [(String, Int)]
pairList = genList 9001

testMap :: M.Map String Int
testMap = M.fromList pairList

main :: IO ()
main = defaultMain
  [ bench "lookup one thing, list" $
    whnf (lookup "doesntExist") pairList
  , bench "lookup one thing, map" $
    whnf (M.lookup "doesntExist") testMap
  ]

stack ghc -- -O2 src/alist.hs

I thought this wasn't working because the bench values were almost the same... then I looked at the unit abbreviation.

Should we use Map all the time?

Using an Int as your key type is usually a sign you’d be better off with a HashMap, IntMap, or Vector, depending on the semantics of your problem.

Set

Map and Set have an Ord constraint for faster searching. Nicer than the "no guarantees" that other languages have for associative structures.

The datatype

data Set a = Bin
    {-# UNPACK #-}
!Size !a !(Set a) !(Set a) | Tip
type Size = Int

module Main where

import Criterion.Main
import qualified Data.Map as M
import qualified Data.Set as S

bumpIt (i, v) = (i + 1, v + 1)
m :: M.Map Int Int
m = M.fromList $ take 10000 stream
  where stream = iterate bumpIt (0, 0)

s :: S.Set Int
s = S.fromList $ take 10000 stream
  where stream = iterate (+1) 0

membersMap :: Int -> Bool
membersMap i = M.member i m

membersSet :: Int -> Bool
membersSet i = S.member i s

main :: IO ()
main = defaultMain
  [ bench "member check map" $
    whnf membersMap 9999
  , bench "member check set" $
    whnf membersSet 9999
  ]

stack ghc -- -O2 src/set.hs
./src/set

Not much difference

Sequence

Sequence appends cheaply on the front and the back, which avoids a common problem with lists where you can only cons to the front cheaply.

The datatype

newtype Seq a = Seq (FingerTree (Elem a))

-- Elem is so elements and nodes can be -- distinguished in the types of the
-- implementation. Don't sweat it. 
newtype Elem a = Elem { getElem :: a }

data FingerTree a 
    = Empty
    | Single a
    | Deep {-# UNPACK #-} !Int !(Digit a)
           (FingerTree (Node a)) !(Digit a)

You won’t want to resort to using Sequence by default though, as the list type is often competitive.

module Main where

import Criterion.Main
import qualified Data.Sequence as S

lists :: [[Int]]
lists = replicate 10 [1..100000]

seqs :: [S.Seq Int]
seqs =
  replicate 10 (S.fromList [1..100000])

main :: IO ()
main = defaultMain
  [ bench "concatenate lists" $
    nf mconcat lists
  , bench "concatenate sequences" $
    nf mconcat seqs
  ]

stack ghc -- -O2 src/seq.hs
./src/seq

When you know you need cheap appending to the end of a long list, it’s worth giving Sequence a try, but it’s better to base the nal decision on benchmarking data if performance matters.fd

It sounds like we won't reach for a Sequence too often

When you know you need cheap appending to the end of a long list, it’s worth giving Sequence a try, but it’s better to base the nal decision on benchmarking data if performance matters.fd

It sounds like we won't reach for a Sequence too often

Vector

Haskell has an Array type, but

One rarely uses arrays, or more speci cally, Array10 in Haskell. Vector is almost always what you want instead of an array. The default Vector type is implemented as a slice wrapper of Array

The datatype

-- | Boxed vectors, supporting -- efficient slicing.
data Vector a =
     Vector {-# UNPACK #-} !Int 
            {-# UNPACK #-} !Int
            {-# UNPACK #-} !(Array a) 
     deriving ( Typeable )

There are varieties of vectors, but this one is the most common. Others:

mutable
boxed: can reference any datatype
unboxed: references raw values without pointer indirection

Slicing refers to slicing off a portion, or creating a sub-array. The Vector type is designed for making slicing much cheaper than it otherwise would be

module Main where

import Criterion.Main
import qualified Data.Vector as V

slice :: Int -> Int -> [a] -> [a]
slice from len xs =
  take len (drop from xs)

l :: [Int]
l = [1..1000]

v :: V.Vector Int
v = V.fromList [1..1000]

main :: IO ()
main = defaultMain
  [ bench "slicing list" $
    whnf (head . slice 100 900) l
  , bench "slicing vector" $
    whnf (V.head . V.slice 100 900) v
  ]

stack ghc -- -O2 src/slice.hs
./src/slice

Speed comes from storing the slice as indexes into the original rather than copying data.

Fusion, or loop fusion, means that as an optimization the compiler can fuse several loops into one megaloop and do it in one pass

module Main where

import Criterion.Main
import qualified Data.Vector as V

testV' :: Int -> V.Vector Int
testV' n =
  V.map (+n) $ V.map (+n) $
    V.map (+n) $ V.map (+n)
    (V.fromList [1..10000])

testV :: Int -> V.Vector Int
testV n =
  V.map ( (+n) . (+n)
        . (+n) . (+n) )
        (V.fromList [1..10000])

main :: IO ()
main = defaultMain
  [ bench "vector map prefused" $
    whnf testV 9998
  , bench "vector map will be fused" $
    whnf testV' 9998
  ]

stack ghc -- -O2 src/fusion.hs
./src/fusion

This is a great sales pitch for types and the compiler. The V.map calls can be safely fused together.

[// is] a batch update operator that allows you to modify several elements of the vector at one time:

module Main where

import Criterion.Main
import Data.Vector ((//))
import qualified Data.Vector as V

vec :: V.Vector Int
vec = V.fromList [1..10000]

slow :: Int -> V.Vector Int
slow n = go n vec
  where go 0 v = v
        go n v = go (n-1) (v // [(n, 0)])

batchList :: Int -> V.Vector Int
batchList n = vec // updates
  where updates =
          fmap (\n -> (n, 0)) [0..n]

batchVector :: Int -> V.Vector Int
batchVector n = V.unsafeUpdate vec updates
  where updates =
          fmap (\n -> (n, 0))
          (V.fromList [0..n])

main :: IO ()
main = defaultMain
  [ bench "slow" $
    whnf slow 9998
  , bench "batch list" $
    whnf batchList 9998
  , bench "batch vector" $
    whnf batchVector 9998
  ]

stack ghc -- -O2 src/batch.hs
./src/batch

We have have some mutation rope, but not enough to hang ourselves.

module Main where
import Control.Monad.Primitive
import Control.Monad.ST
import Criterion.Main
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as MV
import qualified Data.Vector.Generic.Mutable as GM

mutableUpdateIO
  :: Int
  -> IO (MV.MVector RealWorld Int)
mutableUpdateIO n = do
  mvec <- GM.new (n+1)
  go n mvec
  where go 0 v = return v
        go n v =
          (MV.write v n 0) >> go (n-1) v

mutableUpdateST :: Int -> V.Vector Int
mutableUpdateST n = runST $ do
  mvec <- GM.new (n+1)
  go n mvec
  where go 0 v = V.freeze v
        go n v =
          (MV.write v n 0) >> go (n-1) v

main :: IO ()
main = defaultMain
  [ bench "mutable IO vector"
    $ whnfIO (mutableUpdateIO 9998)
  , bench "mutable ST vector"
    $ whnf mutableUpdateST 9998
  ]

Don’t resort to the use of mutation via IO or ST except where you know it’s necessary.

Regarding the ST monad

ST can be thought of as a mutable variant of the strict State monad. From another angle, it could be thought of as IO restricted exclusively to mutation which is guaranteed safe.

It unfreezes your data, mutates it, then refreezes it so that it can’t mutate anymore. Thus it manages to mutate and still maintain referential transparency.

String types

They have been lying to us about String

It’s a type alias for a list of Char, yet underneath it’s not quite as simple as an actual list of Char.

Text

The Text type is

best when you have plain text, but need to store the data more e ciently — particularly as it concerns memory usage.

Compact and with efficient indexing, however UTF-16 storage isn't what most people expect.

module Main where
import Control.Monad
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import qualified System.IO as SIO

-- replace "/usr/share/dict/words" -- as appropriate
dictWords :: IO String
dictWords =
  SIO.readFile "/usr/share/dict/words"

dictWordsT :: IO T.Text
dictWordsT =
  TIO.readFile "/usr/share/dict/words"

main :: IO () main = do
  replicateM_ 1000 (dictWords >>= print)
  replicateM_ 1000
  (dictWordsT >>= TIO.putStrLn)

I needed to stack install --library-profiling

stack ghc -- -prof -fprof-auto -rtsopts -O2 src/text.hs
./src/text +RTS -hc -p
stack exec hp2ps text.hp

ByteString

ByteString is not a string.

ByteStrings are sequences of bytes represented (indirectly) as a vector of Word8 values

[Char8 doesn't] work for Unicode and shouldn’t be used anywhere there’s even a hint of possibility that there could be Unicode data.

So basically never.

Data.Char has helpers. ord gets the Int value of the char byte. chr gets the char from an Int.

Use ByteString for text when you are, for example, transferring text data but not manipulating it.

Chapter Exercises

Difference List

module Main where

import Criterion.Main

newtype DList a = DL { unDL :: [a] -> [a] }

empty :: DList a
empty = DL id
{-# INLINE empty #-}

singleton :: a -> DList a
singleton x = DL ([x] ++)
{-# INLINE singleton #-}

toList :: DList a -> [a]
toList x = (unDL x) []
{-# INLINE toList #-}

-- Prepend a single element to a dlist.
infixr `cons`
cons        :: a -> DList a -> DList a
cons x xs   = DL ((x:) . unDL xs)
{-# INLINE cons #-}

-- Append a single element to a dlist.
infixl `snoc`
snoc :: DList a -> a -> DList a
snoc xs x = DL (unDL xs . (x:))
{-# INLINE snoc #-}

-- Append dlists.
append :: DList a -> DList a -> DList a
append xs ys = DL (unDL xs . unDL ys)
{-# INLINE append #-}


schlemiel :: Int -> [Int]
schlemiel i = go i []
  where go 0 xs = xs
        go n xs = go (n-1) ([n] ++ xs)

constructDlist :: Int -> [Int]
constructDlist i = toList $ go i empty
  where go 0 xs = xs
        go n xs =
          go (n-1)
          (singleton n `append` xs)

main :: IO ()
main = defaultMain
  [ bench "concat list" $
    whnf schlemiel 123456
  , bench "concat dlist" $
    whnf constructDlist 123456
]

Queue

Anyone..?

jwalgran/ch28-basic-libraries.md

Basic libraries

Basic libraries and data structures

Benchmarking with Criterion

Profiling your programs

Constant applicative forms

Map

Set

Sequence

Vector

String types

Text

ByteString

Chapter Exercises

Difference List

Queue