Profiling in Haskell

Do not get bogged down in microoptimizations before you've assessed any macro optimizations that are available. IO and the choice of algorithm dominate any low level changes you may make. In the end you have to think hard about your code!

Before starting to optimize:

Is the -O2 flag on ?
Profile: which part of the code is the slow one.
Use the best algorithm in that part.
Optimize: implement it in the most efficient way.

Profiling

Manual costs centers is usually better and avoids profiling library dependencies. Don't add cost centers to functions that should be inlined because SCC pragma forces no-inline.

Profiling with GHC

Manual here

# This will add SSC everywhere
# You will probably want to change it to manual and use {-# SCC "name" #-} <expression>

ghc -O2 -prof -fprof-auto -rtsopts Example.hs 
./Example +RTS -p -RTS
cat Example.prof

Profiling with Cabal

Don't forget -O2

Manual cost centers:

# Add {-# SCC <name> #-} manually to the functions you want to profile

cabal build --enable-profiling --ghc-options="-fno-prof-auto"
time cabal exec example -- +RTS -p -s -RTS # Produce project.prof and output rts statistics

Automatic cost centers (use with care):

cabal build --enable-profiling --ghc-options="-fprof-auto"
time cabal exec example -- +RTS -p -s -RTS

Recall that for multi-threading you will need:

cabal build --enable-profiling --ghc-options="-threaded -fprof-auto"
time cabal exec example -- +RTS -N -p -s -RTS

Profiling with Stack

Manual cost centers:

mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fno-prof-auto"
time .stack-bin/example +RTS -p

Automatic cost centers:

mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fprof-auto"
time .stack-bin/example +RTS -p

Profiling with Nix

See example here

Dumping Core and STG

Always dump to a file: -ddump-to-file
Dump Core after optimizations: -ddump-simpl
You can also dump STG: -ddump-stg

In *.cabal:

flag dump
  manual: True
  default: True

library
  build-depends:
  ghc-options: -O2

  if flag(dump)
    ghc-options: -ddump-simpl -ddump-stg -ddump-to-file

Spaceleak detection

Read:

For example, if i see that a particular pure function is taking a long time relative to the rest of the code, and that it's Text, and I'm seeing ARR_WORDS rise linearly in the heap, I probably have a thunk-based memory leak. This is knowledge you build up over time.

Tools

When you need to profile cpu usage:

For thread profiling:

When you need to profile memory usage:

eventlog2html

When you need to benchmark your application:

Getting the tools

To get an environment with all profiling tools:

$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: with pkgs; [ criterion deepseq parallel ])' haskellPackages.profiteur haskellPackages.threadscope haskellPackages.eventlog2html haskellPackages.ghc-prof-flamegraph

Using the tools

All examples are based on this program:

hellofib.hs

import Control.Parallel.Strategies
import System.Environment

fib 0 = 1
fib 1 = 1
fib n = runEval $ do
 x <- rpar (fib (n-1))
 y <- rseq (fib (n-2))
 return (x + y + 1)

main = do
 args <- getArgs
 n <- case args of
       []    -> return 20
       [x]   -> return (read x)
       _     -> fail ("Usage: hellofib [n]")
 print (fib n)

profiteur

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ profiteur hellofib.prof
$ firefox hellofib.prof.html

ghc-prof-flamegraph

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ ghc-prof-flamegraph hellofib.prof > output.svg
$ firefox output.svg

eventlog2html

Heap profiling rts options

$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
# Use -hc to know where the thunk is being created.
# Use -hd or -hy to know which data constructor/type is creating the thunk.
# Use -hr to know why your data is not being garbage collected (retained).
$ ./hellofib +RTS -N -hy -l # -l-agu to not include thread events
$ eventlog2html hellofib.eventlog
$ firefox hellofib.eventlog.html

cabal build --enable-profiling --ghc-options="-fprof-auto"
cabal exec example -- +RTS -hc -l -RTS

For some reason, if you manually add the cost centers and use -f-no-prof-auto the graph is empty.

There is a new flag -hi for profiling which gives you detailed information where the thunks (unevaluated closures) are accumulating:

$ ghc -eventlog -rtsopts -O2 -finfo-table-map -fdistinct-constructor-tables LargeThunk
$ ./LargeThunk 100000 100000 30000000 +RTS -l -hi -i0.5 -RTS
$ eventlog2html LargeThunk.eventlog

More on the blog post: https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/

Threadscope

Thread profiling and GC insight.

$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -N -l -s
$ threadscope hellofib.eventlog

ghc-events-analyze

Threadscope shows CPU cores activity while ghc-events-analyze shows Haskell threads activity. ghc-events-analyze works for single concurrent programs. ghc-events-analyze allows to instrument regions of your code by named events.

monadplus/profiling_haskell.md

Profiling in Haskell

Profiling

Profiling with GHC

Profiling with Cabal

Profiling with Stack

Profiling with Nix

Dumping Core and STG

Spaceleak detection

Tools

Getting the tools

Using the tools

profiteur

ghc-prof-flamegraph

eventlog2html

Threadscope

ghc-events-analyze

Resources

Docs

Blogs

Case Study

Books

Videos

monadplus commented Sep 1, 2022

Uh oh!

monadplus/profiling_haskell.md

Profiling in Haskell

Profiling

Profiling with GHC

Profiling with Cabal

Profiling with Stack

Profiling with Nix

Dumping Core and STG

Spaceleak detection

Tools

Getting the tools

Using the tools

profiteur

ghc-prof-flamegraph

eventlog2html

Threadscope

ghc-events-analyze

Resources

Docs

Blogs

Case Study

Books

Videos

monadplus commented Sep 1, 2022

Tip

Uh oh!