That's enough, Linus.
Goal: Reduce CPU and memory
Data structure choice can have a big impact on resource usage.
Text mentions that things aren't getting faster, but it's also good to mention that things are getting smaller. Minimizing CPU use means less power and better battery life.
You can't reduce something unless you know the quantity you are starting with.
Criterion http://www.serpentine.com/criterion/
module Main where
import Criterion.Main
infixl 9 !?
_ !? n | n < 0 = Nothing
[] !? _ = Nothing
(x:_) !? 0 = Just x
(_:xs) !? n = xs !? (n-1)
myList :: [Int]
myList = [1..9999]
main :: IO ()
main = defaultMain
[ bench "index list 9999"
$ whnf (myList !!) 9998
, bench "index list maybe index 9999"
$ whnf (myList !?) 9998
]
The output after being compiled with stack ghc -- -O2 src/Main.hs
benchmarking index list 9999
time 20.64 μs (20.16 μs .. 21.22 μs)
0.995 R² (0.991 R² .. 0.998 R²)
mean 21.00 μs (20.59 μs .. 21.71 μs)
std dev 1.881 μs (1.408 μs .. 2.574 μs)
variance introduced by outliers: 82% (severely inflated)
benchmarking index list maybe index 9999
time 102.7 μs (101.2 μs .. 104.2 μs)
0.998 R² (0.997 R² .. 0.999 R²)
mean 101.5 μs (100.5 μs .. 103.3 μs)
std dev 4.372 μs (2.832 μs .. 7.272 μs)
variance introduced by outliers: 44% (moderately inflated)
Criterion.Main
provides whnf
and nf
functions.
Weak head normal form, as we said before, evaluates to the first data constructor. That means that if your outermost data constructor is a Maybe, it’s only going to evaluate enough to find out if it’s a Nothing or a Just.
It is important to choose the form function that will exercise your code in the most appropriate way.
The types
defaultMain :: [Benchmark] -> IO ()
whnf :: (a -> b) -> a -> Benchmarkable
nf :: Control.DeepSeq.NFData b => (a -> b) -> a -> Benchmarkable
Note that using nf
with a custom type requires defining an NFData
instance.
How can we match the speed of the non-Maybe version.
The !!
source can guide us.
https://hackage.haskell.org/package/base-4.10.1.0/docs/src/GHC.List.html#%21%21
(!!) :: [a] -> Int -> a
#ifdef USE_REPORT_PRELUDE
xs !! n | n < 0 = errorWithoutStackTrace "Prelude.!!: negative index"
[] !! _ = errorWithoutStackTrace "Prelude.!!: index too large"
(x:_) !! 0 = x
(_:xs) !! n = xs !! (n-1)
#else
-- We don't really want the errors to inline with (!!).
-- We may want to fuss around a bit with NOINLINE, and
-- if so we should be careful not to trip up known-bottom
-- optimizations.
tooLarge :: Int -> a
tooLarge _ = errorWithoutStackTrace (prel_list_str ++ "!!: index too large")
negIndex :: a
negIndex = errorWithoutStackTrace $ prel_list_str ++ "!!: negative index"
{-# INLINABLE (!!) #-}
xs !! n
| n < 0 = negIndex
| otherwise = foldr (\x r k -> case k of
0 -> x
_ -> r (k-1)) tooLarge xs n
#endif
Rewriting our implementation with a foldr
infixl 9 !?
{-# INLINABLE (!?) #-}
xs !? n
| n < 0 = Nothing
| otherwise =
foldr
(\x r k ->
case k of
0 -> Just x
_ -> r (k-1))
(const Nothing) xs n
Is basically the same
benchmarking index list 9999
time 20.11 μs (19.75 μs .. 20.54 μs)
0.996 R² (0.993 R² .. 0.999 R²)
mean 20.45 μs (20.10 μs .. 20.97 μs)
std dev 1.351 μs (959.3 ns .. 1.967 μs)
variance introduced by outliers: 71% (severely inflated)
benchmarking index list maybe index 9999
time 100.9 μs (100.1 μs .. 101.8 μs)
0.999 R² (0.998 R² .. 1.000 R²)
mean 101.1 μs (100.4 μs .. 102.3 μs)
std dev 2.935 μs (2.138 μs .. 3.821 μs)
variance introduced by outliers: 26% (moderately inflated)
Adding a type signature removes replaces an ambiguous Num
with a concrete
Int
which results in the compiler producing a more "more primative" and faster
loop.
benchmarking index list 9999
time 20.13 μs (19.87 μs .. 20.40 μs)
0.998 R² (0.998 R² .. 0.999 R²)
mean 20.25 μs (20.01 μs .. 20.53 μs)
std dev 912.2 ns (712.6 ns .. 1.194 μs)
variance introduced by outliers: 53% (severely inflated)
benchmarking index list maybe index 9999
time 18.47 μs (18.23 μs .. 18.78 μs)
0.998 R² (0.996 R² .. 0.999 R²)
mean 18.73 μs (18.41 μs .. 19.30 μs)
std dev 1.334 μs (776.4 ns .. 2.073 μs)
variance introduced by outliers: 74% (severely inflated)
More on deciding between whnf
and nf
. In a hypothetical query records from a
DB and write them to a file example, whnf
is appropriate when you are
interested in measuring the query and nf
if you also want to capture the file
output time.
The book walks through a nf
bottom failure example but I could not replicate it.
GHC has built in profiling of time spent in functions and memory.
module Main where
f :: IO ()
f = do
print ([1..] !! 999999)
putStrLn "f"
g :: IO ()
g = do
print ([1..] !! 9999999)
putStrLn "g"
main :: IO ()
main = do
f
g
stack ghc -- -prof -fprof-auto -rtsopts -O2 src/profilingTime.hs
./profilingTime +RTS -P
cat profile.prof
module Main where
import Control.Monad
blah :: [Integer]
blah = [1..1000]
main :: IO ()
main =
replicateM_ 10000 (print blah)
stack ghc -- -prof -fprof-auto -rtsopts -O2 src/heap.hs
./src/heap +RTS -hc -p
stack exec hp2ps heap.hp
CAFs are expressions that have no free variables and are held in memory to be shared with all other expressions in a module
module Main where
incdInts :: [Integer]
incdInts = map (+1) [1..]
main :: IO ()
main = do
print (incdInts !! 1000)
print (incdInts !! 9001)
print (incdInts !! 90010)
print (incdInts !! 9001000)
print (incdInts !! 9501000)
print (incdInts !! 9901000)
stack ghc -- -prof -fprof-auto -rtsopts -O2 src/caf.hs
./src/caf +RTS -P
cat caf.prof
Performance in exchange for heap. They give us a tip to look for CAFs if our programs are consuming a lot of memory.
We can remove the CAF
module Main where
-- not a CAF
incdInts :: [Integer] -> [Integer]
incdInts x = map (+1) x
main :: IO ()
main = do
print (incdInts [1..] !! 1000)
stack ghc -- -prof -fprof-auto -rtsopts -O2 src/caffree.hs
./src/caffree +RTS -P
cat caffree.prof
The datatype
data Map k a = Bin
{-# UNPACK #-}
!Size !k a
!(Map k a) !(Map k a) | Tip
type Size = Int
Compare using a map and a list
module Main where
import Criterion.Main
import qualified Data.Map as M
genList :: Int -> [(String, Int)]
genList n = go n []
where go 0 xs = ("0", 0) : xs
go n' xs =
go (n' - 1) ((show n', n') : xs)
pairList :: [(String, Int)]
pairList = genList 9001
testMap :: M.Map String Int
testMap = M.fromList pairList
main :: IO ()
main = defaultMain
[ bench "lookup one thing, list" $
whnf (lookup "doesntExist") pairList
, bench "lookup one thing, map" $
whnf (M.lookup "doesntExist") testMap
]
stack ghc -- -O2 src/alist.hs
I thought this wasn't working because the bench values were almost the same... then I looked at the unit abbreviation.
Should we use Map
all the time?
Using an Int as your key type is usually a sign you’d be better off with a HashMap, IntMap, or Vector, depending on the semantics of your problem.
Map
and Set
have an Ord
constraint for faster searching. Nicer than the
"no guarantees" that other languages have for associative structures.
The datatype
data Set a = Bin
{-# UNPACK #-}
!Size !a !(Set a) !(Set a) | Tip
type Size = Int
module Main where
import Criterion.Main
import qualified Data.Map as M
import qualified Data.Set as S
bumpIt (i, v) = (i + 1, v + 1)
m :: M.Map Int Int
m = M.fromList $ take 10000 stream
where stream = iterate bumpIt (0, 0)
s :: S.Set Int
s = S.fromList $ take 10000 stream
where stream = iterate (+1) 0
membersMap :: Int -> Bool
membersMap i = M.member i m
membersSet :: Int -> Bool
membersSet i = S.member i s
main :: IO ()
main = defaultMain
[ bench "member check map" $
whnf membersMap 9999
, bench "member check set" $
whnf membersSet 9999
]
stack ghc -- -O2 src/set.hs
./src/set
Not much difference
Sequence appends cheaply on the front and the back, which avoids a common problem with lists where you can only cons to the front cheaply.
The datatype
newtype Seq a = Seq (FingerTree (Elem a))
-- Elem is so elements and nodes can be -- distinguished in the types of the
-- implementation. Don't sweat it.
newtype Elem a = Elem { getElem :: a }
data FingerTree a
= Empty
| Single a
| Deep {-# UNPACK #-} !Int !(Digit a)
(FingerTree (Node a)) !(Digit a)
You won’t want to resort to using Sequence by default though, as the list type is often competitive.
module Main where
import Criterion.Main
import qualified Data.Sequence as S
lists :: [[Int]]
lists = replicate 10 [1..100000]
seqs :: [S.Seq Int]
seqs =
replicate 10 (S.fromList [1..100000])
main :: IO ()
main = defaultMain
[ bench "concatenate lists" $
nf mconcat lists
, bench "concatenate sequences" $
nf mconcat seqs
]
stack ghc -- -O2 src/seq.hs
./src/seq
When you know you need cheap appending to the end of a long list, it’s worth giving Sequence a try, but it’s better to base the nal decision on benchmarking data if performance matters.fd
It sounds like we won't reach for a Sequence
too often
When you know you need cheap appending to the end of a long list, it’s worth giving Sequence a try, but it’s better to base the nal decision on benchmarking data if performance matters.fd
It sounds like we won't reach for a Sequence
too often
Haskell has an Array
type, but
One rarely uses arrays, or more speci cally, Array10 in Haskell. Vector is almost always what you want instead of an array. The default Vector type is implemented as a slice wrapper of Array
The datatype
-- | Boxed vectors, supporting -- efficient slicing.
data Vector a =
Vector {-# UNPACK #-} !Int
{-# UNPACK #-} !Int
{-# UNPACK #-} !(Array a)
deriving ( Typeable )
There are varieties of vectors, but this one is the most common. Others:
- mutable
- boxed: can reference any datatype
- unboxed: references raw values without pointer indirection
Slicing refers to slicing off a portion, or creating a sub-array. The Vector type is designed for making slicing much cheaper than it otherwise would be
module Main where
import Criterion.Main
import qualified Data.Vector as V
slice :: Int -> Int -> [a] -> [a]
slice from len xs =
take len (drop from xs)
l :: [Int]
l = [1..1000]
v :: V.Vector Int
v = V.fromList [1..1000]
main :: IO ()
main = defaultMain
[ bench "slicing list" $
whnf (head . slice 100 900) l
, bench "slicing vector" $
whnf (V.head . V.slice 100 900) v
]
stack ghc -- -O2 src/slice.hs
./src/slice
Speed comes from storing the slice as indexes into the original rather than copying data.
Fusion, or loop fusion, means that as an optimization the compiler can fuse several loops into one megaloop and do it in one pass
module Main where
import Criterion.Main
import qualified Data.Vector as V
testV' :: Int -> V.Vector Int
testV' n =
V.map (+n) $ V.map (+n) $
V.map (+n) $ V.map (+n)
(V.fromList [1..10000])
testV :: Int -> V.Vector Int
testV n =
V.map ( (+n) . (+n)
. (+n) . (+n) )
(V.fromList [1..10000])
main :: IO ()
main = defaultMain
[ bench "vector map prefused" $
whnf testV 9998
, bench "vector map will be fused" $
whnf testV' 9998
]
stack ghc -- -O2 src/fusion.hs
./src/fusion
This is a great sales pitch for types and the compiler. The V.map
calls can be
safely fused together.
[// is] a batch update operator that allows you to modify several elements of the vector at one time:
module Main where
import Criterion.Main
import Data.Vector ((//))
import qualified Data.Vector as V
vec :: V.Vector Int
vec = V.fromList [1..10000]
slow :: Int -> V.Vector Int
slow n = go n vec
where go 0 v = v
go n v = go (n-1) (v // [(n, 0)])
batchList :: Int -> V.Vector Int
batchList n = vec // updates
where updates =
fmap (\n -> (n, 0)) [0..n]
batchVector :: Int -> V.Vector Int
batchVector n = V.unsafeUpdate vec updates
where updates =
fmap (\n -> (n, 0))
(V.fromList [0..n])
main :: IO ()
main = defaultMain
[ bench "slow" $
whnf slow 9998
, bench "batch list" $
whnf batchList 9998
, bench "batch vector" $
whnf batchVector 9998
]
stack ghc -- -O2 src/batch.hs
./src/batch
We have have some mutation rope, but not enough to hang ourselves.
module Main where
import Control.Monad.Primitive
import Control.Monad.ST
import Criterion.Main
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as MV
import qualified Data.Vector.Generic.Mutable as GM
mutableUpdateIO
:: Int
-> IO (MV.MVector RealWorld Int)
mutableUpdateIO n = do
mvec <- GM.new (n+1)
go n mvec
where go 0 v = return v
go n v =
(MV.write v n 0) >> go (n-1) v
mutableUpdateST :: Int -> V.Vector Int
mutableUpdateST n = runST $ do
mvec <- GM.new (n+1)
go n mvec
where go 0 v = V.freeze v
go n v =
(MV.write v n 0) >> go (n-1) v
main :: IO ()
main = defaultMain
[ bench "mutable IO vector"
$ whnfIO (mutableUpdateIO 9998)
, bench "mutable ST vector"
$ whnf mutableUpdateST 9998
]
Don’t resort to the use of mutation via IO or ST except where you know it’s necessary.
Regarding the ST monad
ST can be thought of as a mutable variant of the strict State monad. From another angle, it could be thought of as IO restricted exclusively to mutation which is guaranteed safe.
It unfreezes your data, mutates it, then refreezes it so that it can’t mutate anymore. Thus it manages to mutate and still maintain referential transparency.
They have been lying to us about String
It’s a type alias for a list of Char, yet underneath it’s not quite as simple as an actual list of Char.
The Text
type is
best when you have plain text, but need to store the data more e ciently — particularly as it concerns memory usage.
Compact and with efficient indexing, however UTF-16 storage isn't what most people expect.
module Main where
import Control.Monad
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import qualified System.IO as SIO
-- replace "/usr/share/dict/words" -- as appropriate
dictWords :: IO String
dictWords =
SIO.readFile "/usr/share/dict/words"
dictWordsT :: IO T.Text
dictWordsT =
TIO.readFile "/usr/share/dict/words"
main :: IO () main = do
replicateM_ 1000 (dictWords >>= print)
replicateM_ 1000
(dictWordsT >>= TIO.putStrLn)
I needed to stack install --library-profiling
stack ghc -- -prof -fprof-auto -rtsopts -O2 src/text.hs
./src/text +RTS -hc -p
stack exec hp2ps text.hp
ByteString is not a string.
ByteStrings are sequences of bytes represented (indirectly) as a vector of Word8 values
[Char8 doesn't] work for Unicode and shouldn’t be used anywhere there’s even a hint of possibility that there could be Unicode data.
So basically never.
Data.Char
has helpers. ord
gets the Int
value of the char byte. chr
gets
the char from an Int
.
Use ByteString
for text when you are, for example, transferring text data but
not manipulating it.
module Main where
import Criterion.Main
newtype DList a = DL { unDL :: [a] -> [a] }
empty :: DList a
empty = DL id
{-# INLINE empty #-}
singleton :: a -> DList a
singleton x = DL ([x] ++)
{-# INLINE singleton #-}
toList :: DList a -> [a]
toList x = (unDL x) []
{-# INLINE toList #-}
-- Prepend a single element to a dlist.
infixr `cons`
cons :: a -> DList a -> DList a
cons x xs = DL ((x:) . unDL xs)
{-# INLINE cons #-}
-- Append a single element to a dlist.
infixl `snoc`
snoc :: DList a -> a -> DList a
snoc xs x = DL (unDL xs . (x:))
{-# INLINE snoc #-}
-- Append dlists.
append :: DList a -> DList a -> DList a
append xs ys = DL (unDL xs . unDL ys)
{-# INLINE append #-}
schlemiel :: Int -> [Int]
schlemiel i = go i []
where go 0 xs = xs
go n xs = go (n-1) ([n] ++ xs)
constructDlist :: Int -> [Int]
constructDlist i = toList $ go i empty
where go 0 xs = xs
go n xs =
go (n-1)
(singleton n `append` xs)
main :: IO ()
main = defaultMain
[ bench "concat list" $
whnf schlemiel 123456
, bench "concat dlist" $
whnf constructDlist 123456
]
Anyone..?