While benchmarking https://github.com/pelletier/go-toml/tree/v2, I decided to play with CPU frequency scaling, to eliminate some noise in the benchmarks.
Running on the following:
goos: linux
goarch: amd64
cpu: AMD Ryzen 9 5950X 16-Core Processor
kernel: 5.12.8-300.fc34.x86_64
Setting CPU scaling to max and disabling boosting seems to provide a steady 3.4GHz on all cores.
As seen in results, it significantly speeds up this program, so benchmarks need to be re performed accordingly.
# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
https://www.kernel.org/doc/Documentation/cpu-freq/boost.txt
# echo 0 > /sys/devices/system/cpu/cpufreq/boost
$ grep "cpu MHz" /proc/cpuinfo
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
cpu MHz : 3400.000
name old time/op new time/op delta
UnmarshalDataset/config-32 35.5ms ± 5% 20.4ms ± 1% -42.68% (p=0.000 n=10+9)
UnmarshalDataset/canada-32 111ms ± 6% 72ms ± 1% -35.40% (p=0.000 n=10+10)
UnmarshalDataset/citm_catalog-32 42.5ms ± 2% 25.1ms ± 1% -40.79% (p=0.000 n=9+10)
UnmarshalDataset/twitter-32 15.7ms ± 2% 9.4ms ± 1% -39.97% (p=0.000 n=10+9)
UnmarshalDataset/code-32 129ms ± 4% 89ms ± 1% -31.26% (p=0.000 n=10+10)
UnmarshalDataset/example-32 279µs ± 1% 163µs ± 1% -41.69% (p=0.000 n=9+10)
Unmarshal/SimpleDocument/struct-32 950ns ± 5% 580ns ± 1% -38.95% (p=0.000 n=10+10)
Unmarshal/SimpleDocument/map-32 1.37µs ± 8% 0.83µs ± 1% -39.29% (p=0.000 n=10+10)
Unmarshal/ReferenceFile/struct-32 60.5µs ± 7% 36.0µs ± 1% -40.51% (p=0.000 n=10+10)
Unmarshal/ReferenceFile/map-32 105µs ± 3% 59µs ± 1% -43.40% (p=0.000 n=9+10)
Unmarshal/HugoFrontMatter-32 20.8µs ± 9% 12.4µs ± 0% -40.27% (p=0.000 n=10+9)
name old speed new speed delta
UnmarshalDataset/config-32 29.5MB/s ± 5% 51.5MB/s ± 1% +74.35% (p=0.000 n=10+9)
UnmarshalDataset/canada-32 19.8MB/s ± 6% 30.7MB/s ± 1% +54.69% (p=0.000 n=10+10)
UnmarshalDataset/citm_catalog-32 13.1MB/s ± 2% 22.2MB/s ± 1% +68.87% (p=0.000 n=9+10)
UnmarshalDataset/twitter-32 28.2MB/s ± 2% 47.0MB/s ± 1% +66.57% (p=0.000 n=10+9)
UnmarshalDataset/code-32 20.8MB/s ± 4% 30.3MB/s ± 1% +45.42% (p=0.000 n=10+10)
UnmarshalDataset/example-32 29.1MB/s ± 1% 49.8MB/s ± 1% +71.48% (p=0.000 n=9+10)
Unmarshal/SimpleDocument/struct-32 11.6MB/s ± 5% 19.0MB/s ± 1% +63.70% (p=0.000 n=10+10)
Unmarshal/SimpleDocument/map-32 8.04MB/s ± 9% 13.22MB/s ± 1% +64.38% (p=0.000 n=10+10)
Unmarshal/ReferenceFile/struct-32 86.8MB/s ± 7% 145.7MB/s ± 1% +67.76% (p=0.000 n=10+10)
Unmarshal/ReferenceFile/map-32 50.0MB/s ± 3% 88.4MB/s ± 1% +76.66% (p=0.000 n=9+10)
Unmarshal/HugoFrontMatter-32 26.3MB/s ± 9% 43.9MB/s ± 0% +67.06% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
UnmarshalDataset/config-32 5.91MB ± 0% 5.91MB ± 0% ~ (p=0.504 n=9+9)
UnmarshalDataset/canada-32 84.4MB ± 0% 84.4MB ± 0% ~ (p=0.381 n=10+10)
UnmarshalDataset/citm_catalog-32 35.6MB ± 0% 35.6MB ± 0% ~ (p=0.123 n=10+10)
UnmarshalDataset/twitter-32 13.5MB ± 0% 13.5MB ± 0% ~ (p=0.051 n=10+8)
UnmarshalDataset/code-32 22.2MB ± 0% 22.2MB ± 0% ~ (p=0.306 n=10+10)
UnmarshalDataset/example-32 193kB ± 0% 193kB ± 0% ~ (p=0.323 n=10+9)
Unmarshal/SimpleDocument/struct-32 597B ± 0% 597B ± 0% ~ (all equal)
Unmarshal/SimpleDocument/map-32 973B ± 0% 973B ± 0% ~ (all equal)
Unmarshal/ReferenceFile/struct-32 11.6kB ± 0% 11.6kB ± 0% ~ (all equal)
Unmarshal/ReferenceFile/map-32 28.9kB ± 0% 28.9kB ± 0% ~ (all equal)
Unmarshal/HugoFrontMatter-32 7.39kB ± 0% 7.39kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
UnmarshalDataset/config-32 233k ± 0% 233k ± 0% ~ (all equal)
UnmarshalDataset/canada-32 782k ± 0% 782k ± 0% ~ (all equal)
UnmarshalDataset/citm_catalog-32 192k ± 0% 192k ± 0% ~ (p=0.137 n=8+10)
UnmarshalDataset/twitter-32 56.9k ± 0% 56.9k ± 0% ~ (p=0.086 n=9+9)
UnmarshalDataset/code-32 1.06M ± 0% 1.06M ± 0% ~ (all equal)
UnmarshalDataset/example-32 1.36k ± 0% 1.36k ± 0% ~ (all equal)
Unmarshal/SimpleDocument/struct-32 7.00 ± 0% 7.00 ± 0% ~ (all equal)
Unmarshal/SimpleDocument/map-32 12.0 ± 0% 12.0 ± 0% ~ (all equal)
Unmarshal/ReferenceFile/struct-32 182 ± 0% 182 ± 0% ~ (all equal)
Unmarshal/ReferenceFile/map-32 649 ± 0% 649 ± 0% ~ (all equal)
Unmarshal/HugoFrontMatter-32 143 ± 0% 143 ± 0% ~ (all equal)
The old benchmark had turbo enabled and was using the schedutil
governor.
Nothing new under the sun, just a reminder for myself to set that up when benchmarking.
Experiementally this has the least variance on my machine for this project:
export GOMAXPROCS=1; nice -n -19 taskset -c 4 go test -run=Nothing ./benchmark -bench=Unmarshal -count 10