Do not get bogged down in microoptimizations before you've assessed any macro optimizations that are available. IO and the choice of algorithm dominate any low level changes you may make. In the end you have to think hard about your code!
Before starting to optimize:
- Is the -O2 flag on ?
- Profile: which part of the code is the slow one.
- Use the best algorithm in that part.
- Optimize: implement it in the most efficient way.
Manual costs centers is usually better and avoids profiling library dependencies. Don't add cost centers to functions that should be inlined because SCC pragma forces no-inline.
Manual here
# This will add SSC everywhere
# You will probably want to change it to manual and use {-# SCC "name" #-} <expression>
ghc -O2 -prof -fprof-auto -rtsopts Example.hs
./Example +RTS -p -RTS
cat Example.profDon't forget -O2
Manual cost centers:
# Add {-# SCC <name> #-} manually to the functions you want to profile
cabal build --enable-profiling --ghc-options="-fno-prof-auto"
time cabal exec example -- +RTS -p -s -RTS # Produce project.prof and output rts statisticsAutomatic cost centers (use with care):
cabal build --enable-profiling --ghc-options="-fprof-auto"
time cabal exec example -- +RTS -p -s -RTSRecall that for multi-threading you will need:
cabal build --enable-profiling --ghc-options="-threaded -fprof-auto"
time cabal exec example -- +RTS -N -p -s -RTSManual cost centers:
mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fno-prof-auto"
time .stack-bin/example +RTS -pAutomatic cost centers:
mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fprof-auto"
time .stack-bin/example +RTS -pSee example here
- Always dump to a file:
-ddump-to-file - Dump Core after optimizations:
-ddump-simpl - You can also dump STG:
-ddump-stg
In *.cabal:
flag dump
manual: True
default: True
library
build-depends:
ghc-options: -O2
if flag(dump)
ghc-options: -ddump-simpl -ddump-stg -ddump-to-file
Read:
For example, if i see that a particular pure function is taking a long time relative to the rest of the code, and that it's Text, and I'm seeing ARR_WORDS rise linearly in the heap, I probably have a thunk-based memory leak. This is knowledge you build up over time.
When you need to profile cpu usage:
For thread profiling:
When you need to profile memory usage:
When you need to benchmark your application:
To get an environment with all profiling tools:
$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: with pkgs; [ criterion deepseq parallel ])' haskellPackages.profiteur haskellPackages.threadscope haskellPackages.eventlog2html haskellPackages.ghc-prof-flamegraphAll examples are based on this program:
hellofib.hs
import Control.Parallel.Strategies
import System.Environment
fib 0 = 1
fib 1 = 1
fib n = runEval $ do
x <- rpar (fib (n-1))
y <- rseq (fib (n-2))
return (x + y + 1)
main = do
args <- getArgs
n <- case args of
[] -> return 20
[x] -> return (read x)
_ -> fail ("Usage: hellofib [n]")
print (fib n)$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ profiteur hellofib.prof
$ firefox hellofib.prof.html$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ ghc-prof-flamegraph hellofib.prof > output.svg
$ firefox output.svg$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
# Use -hc to know where the thunk is being created.
# Use -hd or -hy to know which data constructor/type is creating the thunk.
# Use -hr to know why your data is not being garbage collected (retained).
$ ./hellofib +RTS -N -hy -l # -l-agu to not include thread events
$ eventlog2html hellofib.eventlog
$ firefox hellofib.eventlog.htmlcabal build --enable-profiling --ghc-options="-fprof-auto"
cabal exec example -- +RTS -hc -l -RTSFor some reason, if you manually add the cost centers and use
-f-no-prof-autothe graph is empty.
There is a new flag -hi for profiling which gives you detailed information where the thunks (unevaluated closures) are accumulating:
$ ghc -eventlog -rtsopts -O2 -finfo-table-map -fdistinct-constructor-tables LargeThunk
$ ./LargeThunk 100000 100000 30000000 +RTS -l -hi -i0.5 -RTS
$ eventlog2html LargeThunk.eventlogMore on the blog post: https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/
Thread profiling and GC insight.
$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -N -l -s
$ threadscope hellofib.eventlogThreadscope shows CPU cores activity while ghc-events-analyze shows Haskell threads activity. ghc-events-analyze works for single concurrent programs. ghc-events-analyze allows to instrument regions of your code by named events.
- A First Look at Info Table Profiling
- Detecting Space Leaks
- Flame graphs for GHC time profiles with ghc-prof-flamegraph
- FPComplete: Profiling and Performance
- Haskell wiki: performance
- Locating Performance Bottlenecks
- Memory Fragmentation
- Micro-optimizations
- Performance profiling with ghc-events-analyze
- Profiteur: a visualiser for Haskell GHC .prof files
- Spaceleak Stack-limiting Technique: lots of interesting links about spaceleaks inside.
- Top tips and tools for optimising Haskell
- Stackoverflow: GHC's RTS options for garbage collection - Simon Marlow
Tip
If you ever experience an exception which requires a stack trace to be debugged, use -xc