Replacing imaginary/queens
with the implemenation below in rahulmutt/nofib
:
{-# LANGUAGE BangPatterns #-}
-- solution by Oystein Kolsrud
-- https://www.youtube.com/watch?v=I2tMmsZC1ZU
okToAdd :: Int -> [Int] -> Bool
okToAdd q qs = all (okToAddDirection q qs) [succ, pred, id]
where
okToAddDirection q qs f = and $ zipWith (/=) (tail (iterate f q)) qs
extendSolution n qs = map (\q -> q:qs) $ filter (\q -> okToAdd q qs) [1..n]
allSolutions !n 0 = [[]]
allSolutions !n k = concatMap (extendSolution n) (allSolutions n (k-1))
main = do
putStr "12 Queens, "
putStr (show $ length $ allSolutions 12 12)
putStr " Solutions.\n"
and running this command:
nofib-runner imaginary/queens --run --jmh="-wi 10 -i 10" --way="-O2"
We get this result:
# JMH 1.15 (released 115 days ago)
# VM version: JDK 1.8.0_102, VM 25.102-b14
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 10 iterations, single-shot each
# Measurement: 10 iterations, single-shot each
# Timeout: 10 min per iteration
# Threads: 1 thread
# Benchmark mode: Single shot invocation time
# Benchmark: com.typelead.TestBenchmark.benchmark
# Parameters: (args = 10 +RTS)
# Run progress: 0.00% complete, ETA 00:00:00
# Fork: 1 of 1
# Warmup Iteration 1: 280582.188 ms/op
# Warmup Iteration 2: 1.463 ms/op
# Warmup Iteration 3: 1.613 ms/op
# Warmup Iteration 4: 1.647 ms/op
# Warmup Iteration 5: 1.448 ms/op
# Warmup Iteration 6: 1.539 ms/op
# Warmup Iteration 7: 1.291 ms/op
# Warmup Iteration 8: 1.251 ms/op
# Warmup Iteration 9: 1.435 ms/op
# Warmup Iteration 10: 1.014 ms/op
Iteration 1: 1.344 ms/op
Iteration 2: 1.537 ms/op
Iteration 3: 1.419 ms/op
Iteration 4: 1.478 ms/op
Iteration 5: 1.233 ms/op
Iteration 6: 1.456 ms/op
Iteration 7: 1.124 ms/op
Iteration 8: 1.273 ms/op
Iteration 9: 0.994 ms/op
Iteration 10: 1.116 ms/op
Result "benchmark":
N = 10
mean = 1.297 ±(99.9%) 0.272 ms/op
Histogram, ms/op:
[0.900, 0.950) = 0
[0.950, 1.000) = 1
[1.000, 1.050) = 0
[1.050, 1.100) = 0
[1.100, 1.150) = 2
[1.150, 1.200) = 0
[1.200, 1.250) = 1
[1.250, 1.300) = 1
[1.300, 1.350) = 1
[1.350, 1.400) = 0
[1.400, 1.450) = 1
[1.450, 1.500) = 2
[1.500, 1.550) = 1
Percentiles, ms/op:
p(0.0000) = 0.994 ms/op
p(50.0000) = 1.309 ms/op
p(90.0000) = 1.531 ms/op
p(95.0000) = 1.537 ms/op
p(99.0000) = 1.537 ms/op
p(99.9000) = 1.537 ms/op
p(99.9900) = 1.537 ms/op
p(99.9990) = 1.537 ms/op
p(99.9999) = 1.537 ms/op
p(100.0000) = 1.537 ms/op
# Run complete. Total time: 00:04:41
Benchmark (args) Mode Cnt Score Error Units
TestBenchmark.benchmark 10 +RTS ss 10 1.297 ± 0.272 ms/op
To investigate why it takes so long on the first iteration, we try:
nofib-runner imaginary/queens --run --jmh="-prof cl -wi 0 -i 10" --way="-O2"
which gives:
# JMH 1.15 (released 115 days ago)
# VM version: JDK 1.8.0_102, VM 25.102-b14
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: <none>
# Measurement: 10 iterations, single-shot each
# Timeout: 10 min per iteration
# Threads: 1 thread
# Benchmark mode: Single shot invocation time
# Benchmark: com.typelead.TestBenchmark.benchmark
# Parameters: (args = 10 +RTS)
# Run progress: 0.00% complete, ETA 00:00:00
# Fork: 1 of 1
Iteration 1: 277928.321 ms/op
·class.load: 1311033.096 classes/sec
·class.load.norm: 4717.000 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 2: 1.674 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 3: 1.611 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 4: 1.685 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 5: 1.406 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 6: 1.679 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 7: 1.341 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 8: 1.250 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 9: 1.436 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Iteration 10: 1.538 ms/op
·class.load: ≈ 0 classes/sec
·class.load.norm: ≈ 0 classes/op
·class.unload: ≈ 0 classes/sec
·class.unload.norm: ≈ 0 classes/op
Result "benchmark":
N = 10
mean = 27794.194 ±(99.9%) 132874.377 ms/op
Histogram, ms/op:
[ 0.000, 25000.000) = 9
[ 25000.000, 50000.000) = 0
[ 50000.000, 75000.000) = 0
[ 75000.000, 100000.000) = 0
[100000.000, 125000.000) = 0
[125000.000, 150000.000) = 0
[150000.000, 175000.000) = 0
[175000.000, 200000.000) = 0
[200000.000, 225000.000) = 0
[225000.000, 250000.000) = 0
[250000.000, 275000.000) = 0
Percentiles, ms/op:
p(0.0000) = 1.250 ms/op
p(50.0000) = 1.574 ms/op
p(90.0000) = 250135.657 ms/op
p(95.0000) = 277928.321 ms/op
p(99.0000) = 277928.321 ms/op
p(99.9000) = 277928.321 ms/op
p(99.9900) = 277928.321 ms/op
p(99.9990) = 277928.321 ms/op
p(99.9999) = 277928.321 ms/op
p(100.0000) = 277928.321 ms/op
Secondary result "·class.load":
131103.310 ±(99.9%) 626793.463 classes/sec [Average]
(min, avg, max) = (≈ 0, 131103.310, 1311033.096), stdev = 414585.067
CI (99.9%): [≈ 0, 757896.772] (assumes normal distribution)
Secondary result "·class.load.norm":
471.700 ±(99.9%) 2255.156 classes/op [Average]
(min, avg, max) = (≈ 0, 471.700, 4717.000), stdev = 1491.646
CI (99.9%): [≈ 0, 2726.856] (assumes normal distribution)
Secondary result "·class.unload":
≈ 0 classes/sec
Secondary result "·class.unload.norm":
≈ 0 classes/op
# Run complete. Total time: 00:04:38
Benchmark (args) Mode Cnt Score Error Units
TestBenchmark.benchmark 10 +RTS ss 10 27794.194 ± 132874.377 ms/op
TestBenchmark.benchmark:·class.load 10 +RTS ss 10 131103.310 ± 626793.463 classes/sec
TestBenchmark.benchmark:·class.load.norm 10 +RTS ss 10 471.700 ± 2255.156 classes/op
TestBenchmark.benchmark:·class.unload 10 +RTS ss 10 ≈ 0 classes/sec
TestBenchmark.benchmark:·class.unload.norm 10 +RTS ss 10 ≈ 0 classes/op
It spends 131 seconds in class loading time. And 27 seconds in the actual benchmark.