This gist contains benchmarks for read-only concurrent transaction that involve many shared variables. The test data is a mutable single-linked list where individual list "cells" are linked via mutable shared variables. The goal is to measure the overhead of different transaction types and variables in read-only transactions when there are no concurrent writes.
The input to a benchmark run is a list of a of successive Int values.
The benchmark code runs a number of concurrent threads that each fold over the list and
each compute the sum of all entries. The list isn't mutated during the benchmarks.
Benchmark parameters are
- the length of the list,
- the number of concurrent threads, and
- the type of the variables and transactions.
Variable and transaction types are
MVar,Compose IORef Maybe(which is fine, since the data structure isn't mutated during the benchmark),TMVar(inSTM) evaluated in a singleSTMtransaction,TMVar(hoisted) with eachSTMaction hoisted into a separate singletonSTMtransaction,TMVar(inIO) with eachSTMaction called in a separate singletonSTMtransaction,Compose TVar Maybe(inSTM) evaluated in a singleSTMtransaction,Compose TVar Maybe(hoisted) with eachSTMaction hoisted into a separate singletonSTMtransaction, andCompose TVar Maybe(inIO) access inIOwithout usingSTM.
"TMVar hoisted" and "TMVar in IO" result in equivalent code. However, the overhead of the hoist implementation
for the Stream type that is used in the fold and different inlining by GHC may result in different performance.
Build:
ghc -O2 -threaded -rtsopts -main-is VarList -o main VarList.hsRun:
./main +RTS -NAs expected Compose IORef Maybe performs best in all settings. Compose TVar Maybe with readTVarIO performs on par
with plain IORef. MVar has only little overhead.
TMVar in IO and TMVar hoisted both have acceptable performance with some overhead due to the STM transactions.
The hoisted version is somewhat slower, most likely due to overhead in hoist and different inlining by GHC.
In the context of STM transactions TMVar and Compose TVar Maybe perform similar, which can be explained by the
fact that TMVar a is implemented as TVar (Maybe a) in base.
Benchmarks didn't complete in acceptable time for benchmarks that run the complete fold in a single STM transaction
when there was more than a single thread and a moderately large lists lengths (in the order of thousands).
Focusing on the scenarios where the complete fold is done in a single STM transaction, it seems that the run time
scales linear in the number of threads and, up to some point, quadratic in the length of the list.
However, if there is more than a single thread and the length of the list growth beyond a certain point (~15,000 - 20,000 in my tests) the running time distribution exhibits a growing tail of very long runs and eventually most runs become long running. The behavior seems to indicate a race between concurrent transactions, which is surprising, since there are no updates to the variables during the test run.
It seems that read-only STM transactions without concurrent writes scale quadratic in the number involved
variables and linear in the number of concurrent transactions.
Small (singleton) read transactions come with little overhead, but are still slower than MVar with tryReadMVar.
TVars with readTVarIO have the same performance as plain IORefs. Thus TVars have the benefit that they can be used
either like plain IORef without any transaction overhead or in transactions.
Using STM transactions that involve a large number of variables should be avoided, even if when there are no concurrent
writes. In addition to the quadratic complexity it seems that there can be races between read-only transactions
even in the absence of concurrent updates.
Thus,
IORefs are best when there is no concurrent updates beyond what can be done withatomicModifyIORef.MVars are best when concurrent writes can't be ruled out.TVars are best when variables are used in contexts that don't have concurrent writes (wherereadTVarIOcan be used) as well as in contexts with concurrent writes.
It would be nice if base could include a function tryReadTMVarIO that avoids the overhead of an STM transaction.
