This gist contains benchmarks for read-only concurrent transaction that involve many shared variables. The test data is a mutable single-linked list where individual list "cells" are linked via mutable shared variables. The goal is to measure the overhead of different transaction types and variables in read-only transactions when there are no concurrent writes.
The input to a benchmark run is a list of a of successive Int
values.
The benchmark code runs a number of concurrent threads that each fold over the list and
each compute the sum of all entries. The list isn't mutated during the benchmarks.
Benchmark parameters are
- the length of the list,
- the number of concurrent threads, and
- the type of the variables and transactions.
Variable and transaction types are
MVar
,Compose IORef Maybe
(which is fine, since the data structure isn't mutated during the benchmark),TMVar
(inSTM
) evaluated in a singleSTM
transaction,TMVar
(hoisted) with eachSTM
action hoisted into a separate singletonSTM
transaction,TMVar
(inIO
) with eachSTM
action called in a separate singletonSTM
transaction,Compose TVar Maybe
(inSTM
) evaluated in a singleSTM
transaction,Compose TVar Maybe
(hoisted) with eachSTM
action hoisted into a separate singletonSTM
transaction, andCompose TVar Maybe
(inIO
) access inIO
without usingSTM
.
"TMVar
hoisted" and "TMVar
in IO
" result in equivalent code. However, the overhead of the hoist
implementation
for the Stream
type that is used in the fold and different inlining by GHC may result in different performance.
Build:
ghc -O2 -threaded -rtsopts -main-is VarList -o main VarList.hs
Run:
./main +RTS -N
As expected Compose IORef Maybe
performs best in all settings. Compose TVar Maybe
with readTVarIO
performs on par
with plain IORef
. MVar
has only little overhead.
TMVar
in IO
and TMVar
hoisted both have acceptable performance with some overhead due to the STM
transactions.
The hoisted version is somewhat slower, most likely due to overhead in hoist
and different inlining by GHC.
In the context of STM
transactions TMVar
and Compose TVar Maybe
perform similar, which can be explained by the
fact that TMVar a
is implemented as TVar (Maybe a)
in base
.
Benchmarks didn't complete in acceptable time for benchmarks that run the complete fold in a single STM
transaction
when there was more than a single thread and a moderately large lists lengths (in the order of thousands).
Focusing on the scenarios where the complete fold is done in a single STM
transaction, it seems that the run time
scales linear in the number of threads and, up to some point, quadratic in the length of the list.
However, if there is more than a single thread and the length of the list growth beyond a certain point (~15,000 - 20,000 in my tests) the running time distribution exhibits a growing tail of very long runs and eventually most runs become long running. The behavior seems to indicate a race between concurrent transactions, which is surprising, since there are no updates to the variables during the test run.
It seems that read-only STM
transactions without concurrent writes scale quadratic in the number involved
variables and linear in the number of concurrent transactions.
Small (singleton) read transactions come with little overhead, but are still slower than MVar
with tryReadMVar
.
TVar
s with readTVarIO
have the same performance as plain IORef
s. Thus TVar
s have the benefit that they can be used
either like plain IORef
without any transaction overhead or in transactions.
Using STM
transactions that involve a large number of variables should be avoided, even if when there are no concurrent
writes. In addition to the quadratic complexity it seems that there can be races between read-only transactions
even in the absence of concurrent updates.
Thus,
IORef
s are best when there is no concurrent updates beyond what can be done withatomicModifyIORef
.MVar
s are best when concurrent writes can't be ruled out.TVar
s are best when variables are used in contexts that don't have concurrent writes (wherereadTVarIO
can be used) as well as in contexts with concurrent writes.
It would be nice if base
could include a function tryReadTMVarIO
that avoids the overhead of an STM
transaction.