Haskell: Profiling

Profiling in Haskell

Do not get bogged down in microoptimizations before you've assessed any macro optimizations that are available. IO and the choice of algorithm dominate any low level changes you may make. In the end you have to think hard about your code!

Before starting to optimize:

  1. Is the -O2 flag on ?
  2. Profile: which part of the code is the slow one.
  3. Use the best algorithm in that part.
  4. Optimize: implement it in the most efficient way.


Manual costs centers is usually better and avoids profiling library dependencies. Don't add cost centers to functions that should be inlined because SCC pragma forces no-inline.

Profiling with GHC

Manual here

# This will add SSC everywhere
# You will probably want to change it to manual and use {-# SCC "name" #-} <expression>

ghc -O2 -prof -fprof-auto -rtsopts Example.hs 
./Example +RTS -p -RTS

Profiling with Cabal

Don't forget -O2

Manual cost centers:

# Add {-# SCC <name> #-} manually to the functions you want to profile

cabal build --enable-profiling --ghc-options="-fno-prof-auto"
time cabal exec example -- +RTS -p -s -RTS # Produce and output rts statistics

Automatic cost centers (use with care):

cabal build --enable-profiling --ghc-options="-fprof-auto"
time cabal exec example -- +RTS -p -s -RTS

Recall that for multi-threading you will need:

cabal build --enable-profiling --ghc-options="-threaded -fprof-auto"
time cabal exec example -- +RTS -N -p -s -RTS

Profiling with Stack

Manual cost centers:

mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fno-prof-auto"
time .stack-bin/example +RTS -p

Automatic cost centers:

mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fprof-auto"
time .stack-bin/example +RTS -p

Profiling with Nix

See example here

Dumping Core and STG

  • Always dump to a file: -ddump-to-file
  • Dump Core after optimizations: -ddump-simpl
  • You can also dump STG: -ddump-stg

In *.cabal:

flag dump
  manual: True
  default: True

  ghc-options: -O2

  if flag(dump)
    ghc-options: -ddump-simpl -ddump-stg -ddump-to-file

Spaceleak detection


For example, if i see that a particular pure function is taking a long time relative to the rest of the code, and that it's Text, and I'm seeing ARR_WORDS rise linearly in the heap, I probably have a thunk-based memory leak. This is knowledge you build up over time.


When you need to profile cpu usage:

For thread profiling:

When you need to profile memory usage:

When you need to benchmark your application:

Getting the tools

To get an environment with all profiling tools:

$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: with pkgs; [ criterion deepseq parallel ])' haskellPackages.profiteur haskellPackages.threadscope haskellPackages.eventlog2html haskellPackages.ghc-prof-flamegraph

Using the tools

All examples are based on this program:


import Control.Parallel.Strategies
import System.Environment

fib 0 = 1
fib 1 = 1
fib n = runEval $ do
 x <- rpar (fib (n-1))
 y <- rseq (fib (n-2))
 return (x + y + 1)

main = do
 args <- getArgs
 n <- case args of
       []    -> return 20
       [x]   -> return (read x)
       _     -> fail ("Usage: hellofib [n]")
 print (fib n)


$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ profiteur
$ firefox


$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ ghc-prof-flamegraph > output.svg
$ firefox output.svg


Heap profiling rts options

$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
# Use -hc to know where the thunk is being created.
# Use -hd or -hy to know which data constructor/type is creating the thunk.
# Use -hr to know why your data is not being garbage collected (retained).
$ ./hellofib +RTS -N -hy -l # -l-agu to not include thread events
$ eventlog2html hellofib.eventlog
$ firefox hellofib.eventlog.html
cabal build --enable-profiling --ghc-options="-fprof-auto"
cabal exec example -- +RTS -hc -l -RTS

For some reason, if you manually add the cost centers and use -f-no-prof-auto the graph is empty.

There is a new flag -hi for profiling which gives you detailed information where the thunks (unevaluated closures) are accumulating:

$ ghc -eventlog -rtsopts -O2 -finfo-table-map -fdistinct-constructor-tables LargeThunk
$ ./LargeThunk 100000 100000 30000000 +RTS -l -hi -i0.5 -RTS
$ eventlog2html LargeThunk.eventlog

More on the blog post:


Thread profiling and GC insight.

$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -N -l -s
$ threadscope hellofib.eventlog


Threadscope shows CPU cores activity while ghc-events-analyze shows Haskell threads activity. ghc-events-analyze works for single concurrent programs. ghc-events-analyze allows to instrument regions of your code by named events.




Case Study



