You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performance experiment showing overhead of locking versus shared nothing architecture
How the experiement was performed
Rent a packet.com m1.xlarge.x86 box which is an Intel Xeon E5-2650 V4 (2x) with 24 Cores @ 2.2 Ghz at $1.70/hr.
On the box run the SHF performance test in various combinations using 100 million key values.
The SHF performance test can be compiled with or without spin locks, in order to test the spin lock over head.
Obviously when compiled without spin locks, concurrent access is not possible.
However, when compiled without spin locks, multiple single CPU, non-concurrent instances can run in parallel AKA 'shared nothing'.
E.g. comparing 12 CPUs running SHF with locks versus 12 CPUs running individual shared nothing, single CPU SHF instances, reveals the overhead of not using shared nothing.
# PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=1 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-1-keys-100m-shared-nothing-01-of-01.txt | egrep "(operations|LOCK)"
PUT 100,000,000 operations in 57.832 elapsed seconds or 1,729,145 operations per second
UPD 100,000,000 operations in 38.964 elapsed seconds or 2,566,444 operations per second
MIX 100,000,000 operations in 39.876 elapsed seconds or 2,507,761 operations per second
GET 100,000,000 operations in 38.819 elapsed seconds or 2,576,090 operations per second
* MIX is 2% (2000000) del/put, 98% (12100654) get, LOCK is 1, FIXED is 0, DEBUG is 0
Result: With locking: 3 CPUs accessing all keys.
# PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=03 SHF_PERFORMANCE_TEST_LOCK=1 test.f.shf.t 2>&1 | tee shf-cpus-03-lock-1-keys-100m-shared-nothing-01-of-01.txt | egrep "(operations|LOCK)"
PUT 100,000,000 operations in 36.552 elapsed seconds or 2,735,825 operations per second
UPD 100,000,000 operations in 19.548 elapsed seconds or 5,115,621 operations per second
MIX 100,000,000 operations in 19.455 elapsed seconds or 5,139,991 operations per second
GET 100,000,000 operations in 19.344 elapsed seconds or 5,169,514 operations per second
* MIX is 2% (2000000) del/put, 98% (12100654) get, LOCK is 1, FIXED is 0, DEBUG is 0
Result: With locking: 6 CPUs accessing all keys.
# PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=06 SHF_PERFORMANCE_TEST_LOCK=1 test.f.shf.t 2>&1 | tee shf-cpus-06-lock-1-keys-100m-shared-nothing-01-of-01.txt | egrep "(operations|LOCK)"
PUT 100,000,000 operations in 26.174 elapsed seconds or 3,820,543 operations per second
UPD 100,000,000 operations in 13.657 elapsed seconds or 7,322,277 operations per second
MIX 100,000,000 operations in 13.585 elapsed seconds or 7,360,951 operations per second
GET 100,000,000 operations in 15.394 elapsed seconds or 6,495,833 operations per second
* MIX is 2% (2000000) del/put, 98% (12100654) get, LOCK is 1, FIXED is 0, DEBUG is 0
Result: With locking: 12 CPUs accessing all keys.
# PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=12 SHF_PERFORMANCE_TEST_LOCK=1 test.f.shf.t 2>&1 | tee shf-cpus-12-lock-1-keys-100m-shared-nothing-01-of-01.txt | egrep "(operations|LOCK)"
PUT 100,000,000 operations in 20.285 elapsed seconds or 4,929,672 operations per second
UPD 100,000,000 operations in 10.619 elapsed seconds or 9,417,255 operations per second
MIX 100,000,000 operations in 14.139 elapsed seconds or 7,072,759 operations per second
GET 100,000,000 operations in 10.548 elapsed seconds or 9,480,408 operations per second
* MIX is 2% (2000000) del/put, 98% (12100654) get, LOCK is 1, FIXED is 0, DEBUG is 0
Result: Without locking: 1 CPU accessing own shard of keys.
# (PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-100m-shared-nothing-01-of-01.txt | egrep "(operations|LOCK)")
PUT 100,000,000 operations in 55.103 elapsed seconds or 1,814,782 operations per second
UPD 100,000,000 operations in 38.280 elapsed seconds or 2,612,306 operations per second
MIX 100,000,000 operations in 39.358 elapsed seconds or 2,540,752 operations per second
GET 100,000,000 operations in 38.397 elapsed seconds or 2,604,397 operations per second
* MIX is 2% (2000000) del/put, 98% (12100654) get, LOCK is 0, FIXED is 0, DEBUG is 0
Result: Without locking: 3 CPUs accessing own shard of keys.
# (PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=33333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-033m-shared-nothing-01-of-03.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=33333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-033m-shared-nothing-02-of-03.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=33333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-033m-shared-nothing-03-of-03.txt | egrep "(operations|LOCK)") &
PUT 33,333,333 operations in 21.427 elapsed seconds or 1,555,686 operations per second
PUT 33,333,333 operations in 22.927 elapsed seconds or 1,453,865 operations per second
PUT 33,333,333 operations in 23.744 elapsed seconds or 1,403,885 operations per second
UPD 33,333,333 operations in 12.697 elapsed seconds or 2,625,198 operations per second
UPD 33,333,333 operations in 12.863 elapsed seconds or 2,591,444 operations per second
UPD 33,333,333 operations in 15.825 elapsed seconds or 2,106,402 operations per second
MIX 33,333,333 operations in 12.834 elapsed seconds or 2,597,349 operations per second
MIX 33,333,333 operations in 15.331 elapsed seconds or 2,174,219 operations per second
MIX 33,333,333 operations in 15.187 elapsed seconds or 2,194,882 operations per second
GET 33,333,333 operations in 12.991 elapsed seconds or 2,565,822 operations per second
GET 33,333,333 operations in 15.336 elapsed seconds or 2,173,522 operations per second
GET 33,333,333 operations in 13.591 elapsed seconds or 2,452,672 operations per second
* MIX is 2% (666666) del/put, 98% (32666666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (666666) del/put, 98% (32666666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (666666) del/put, 98% (32666666) get, LOCK is 0, FIXED is 0, DEBUG is 0
Result: Without locking: 6 CPUs accessing own shard of keys.
# (PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=16666666 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-016m-shared-nothing-01-of-06.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=16666666 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-016m-shared-nothing-02-of-06.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=16666666 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-016m-shared-nothing-03-of-06.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=16666666 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-016m-shared-nothing-04-of-06.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=16666666 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-016m-shared-nothing-05-of-06.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=16666666 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-016m-shared-nothing-06-of-06.txt | egrep "(operations|LOCK)") &
PUT 16,666,666 operations in 10.844 elapsed seconds or 1,536,978 operations per second
PUT 16,666,666 operations in 10.920 elapsed seconds or 1,526,253 operations per second
PUT 16,666,666 operations in 11.278 elapsed seconds or 1,477,747 operations per second
PUT 16,666,666 operations in 11.087 elapsed seconds or 1,503,322 operations per second
PUT 16,666,666 operations in 11.283 elapsed seconds or 1,477,088 operations per second
PUT 16,666,666 operations in 11.259 elapsed seconds or 1,480,303 operations per second
UPD 16,666,666 operations in 6.389 elapsed seconds or 2,608,677 operations per second
UPD 16,666,666 operations in 6.535 elapsed seconds or 2,550,466 operations per second
UPD 16,666,666 operations in 6.628 elapsed seconds or 2,514,509 operations per second
UPD 16,666,666 operations in 6.709 elapsed seconds or 2,484,127 operations per second
UPD 16,666,666 operations in 6.801 elapsed seconds or 2,450,719 operations per second
UPD 16,666,666 operations in 6.824 elapsed seconds or 2,442,356 operations per second
MIX 16,666,666 operations in 6.480 elapsed seconds or 2,571,831 operations per second
MIX 16,666,666 operations in 6.549 elapsed seconds or 2,544,877 operations per second
MIX 16,666,666 operations in 6.681 elapsed seconds or 2,494,541 operations per second
MIX 16,666,666 operations in 6.865 elapsed seconds or 2,427,769 operations per second
MIX 16,666,666 operations in 6.959 elapsed seconds or 2,395,092 operations per second
MIX 16,666,666 operations in 7.029 elapsed seconds or 2,371,219 operations per second
GET 16,666,666 operations in 6.358 elapsed seconds or 2,621,300 operations per second
GET 16,666,666 operations in 6.429 elapsed seconds or 2,592,596 operations per second
GET 16,666,666 operations in 6.509 elapsed seconds or 2,560,603 operations per second
GET 16,666,666 operations in 6.694 elapsed seconds or 2,489,965 operations per second
GET 16,666,666 operations in 6.758 elapsed seconds or 2,466,064 operations per second
GET 16,666,666 operations in 7.008 elapsed seconds or 2,378,105 operations per second
* MIX is 2% (333333) del/put, 98% (16333332) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (333333) del/put, 98% (16333332) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (333333) del/put, 98% (16333332) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (333333) del/put, 98% (16333332) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (333333) del/put, 98% (16333332) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (333333) del/put, 98% (16333332) get, LOCK is 0, FIXED is 0, DEBUG is 0
Result: Without locking: 12 CPUs accessing own shard of keys.
# (PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-01-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-02-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-03-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-04-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-05-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-06-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-07-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-08-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-09-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-10-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-11-of-12.txt | egrep "(operations|LOCK)") &
(PATH=release-gcc:$PATH SHF_PERFORMANCE_TEST_ENABLE=1 SHF_PERFORMANCE_TEST_CPUS=01 SHF_PERFORMANCE_TEST_LOCK=0 SHF_PERFORMANCE_TEST_KEYS=08333333 test.f.shf.t 2>&1 | tee shf-cpus-01-lock-0-keys-012m-shared-nothing-12-of-12.txt | egrep "(operations|LOCK)") &
PUT 8,333,333 operations in 5.788 elapsed seconds or 1,439,772 operations per second
PUT 8,333,333 operations in 5.804 elapsed seconds or 1,435,708 operations per second
PUT 8,333,333 operations in 5.508 elapsed seconds or 1,513,083 operations per second
PUT 8,333,333 operations in 5.447 elapsed seconds or 1,529,860 operations per second
PUT 8,333,333 operations in 5.815 elapsed seconds or 1,433,030 operations per second
PUT 8,333,333 operations in 5.741 elapsed seconds or 1,451,469 operations per second
PUT 8,333,333 operations in 5.793 elapsed seconds or 1,438,611 operations per second
PUT 8,333,333 operations in 5.563 elapsed seconds or 1,498,097 operations per second
PUT 8,333,333 operations in 5.973 elapsed seconds or 1,395,236 operations per second
PUT 8,333,333 operations in 5.847 elapsed seconds or 1,425,271 operations per second
PUT 8,333,333 operations in 5.641 elapsed seconds or 1,477,371 operations per second
PUT 8,333,333 operations in 5.776 elapsed seconds or 1,442,771 operations per second
UPD 8,333,333 operations in 3.613 elapsed seconds or 2,306,316 operations per second
UPD 8,333,333 operations in 3.454 elapsed seconds or 2,412,900 operations per second
UPD 8,333,333 operations in 3.539 elapsed seconds or 2,354,819 operations per second
UPD 8,333,333 operations in 3.478 elapsed seconds or 2,396,151 operations per second
UPD 8,333,333 operations in 3.444 elapsed seconds or 2,419,876 operations per second
UPD 8,333,333 operations in 3.428 elapsed seconds or 2,431,244 operations per second
UPD 8,333,333 operations in 3.539 elapsed seconds or 2,354,785 operations per second
UPD 8,333,333 operations in 3.410 elapsed seconds or 2,443,882 operations per second
UPD 8,333,333 operations in 3.544 elapsed seconds or 2,351,674 operations per second
UPD 8,333,333 operations in 3.542 elapsed seconds or 2,352,457 operations per second
UPD 8,333,333 operations in 3.569 elapsed seconds or 2,335,071 operations per second
UPD 8,333,333 operations in 4.180 elapsed seconds or 1,993,503 operations per second
MIX 8,333,333 operations in 3.552 elapsed seconds or 2,345,787 operations per second
MIX 8,333,333 operations in 3.539 elapsed seconds or 2,355,036 operations per second
MIX 8,333,333 operations in 3.510 elapsed seconds or 2,374,051 operations per second
MIX 8,333,333 operations in 3.520 elapsed seconds or 2,367,176 operations per second
MIX 8,333,333 operations in 3.634 elapsed seconds or 2,293,374 operations per second
MIX 8,333,333 operations in 3.722 elapsed seconds or 2,238,810 operations per second
MIX 8,333,333 operations in 3.486 elapsed seconds or 2,390,713 operations per second
MIX 8,333,333 operations in 3.630 elapsed seconds or 2,295,456 operations per second
MIX 8,333,333 operations in 3.594 elapsed seconds or 2,318,361 operations per second
MIX 8,333,333 operations in 3.605 elapsed seconds or 2,311,595 operations per second
MIX 8,333,333 operations in 3.629 elapsed seconds or 2,296,313 operations per second
MIX 8,333,333 operations in 4.251 elapsed seconds or 1,960,316 operations per second
GET 8,333,333 operations in 3.475 elapsed seconds or 2,397,995 operations per second
GET 8,333,333 operations in 3.443 elapsed seconds or 2,420,046 operations per second
GET 8,333,333 operations in 3.474 elapsed seconds or 2,398,469 operations per second
GET 8,333,333 operations in 3.451 elapsed seconds or 2,414,794 operations per second
GET 8,333,333 operations in 3.415 elapsed seconds or 2,440,190 operations per second
GET 8,333,333 operations in 3.576 elapsed seconds or 2,330,535 operations per second
GET 8,333,333 operations in 3.634 elapsed seconds or 2,292,982 operations per second
GET 8,333,333 operations in 3.542 elapsed seconds or 2,352,944 operations per second
GET 8,333,333 operations in 3.493 elapsed seconds or 2,385,770 operations per second
GET 8,333,333 operations in 3.505 elapsed seconds or 2,377,244 operations per second
GET 8,333,333 operations in 3.519 elapsed seconds or 2,368,147 operations per second
GET 8,333,333 operations in 3.863 elapsed seconds or 2,157,043 operations per second
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
* MIX is 2% (166666) del/put, 98% (8166666) get, LOCK is 0, FIXED is 0, DEBUG is 0
Result summary
Data collected from above performance experiments:
With spin locks, the operations per second do not double when the number of CPUs doubles.
Without spin locks, the operations per second do double when the number of CPUs doubles.
With or without spin locks, the operations per seconds is similar with only one CPU.
As the number of CPUs increases, the without to with spin locks ratio gets bigger.
The performance on the packet.com m1.xlarge.x86 box is significantly slower than on my Dell Xeon laptop with fast RAM.
The m1.xlarge.x86 manages 9.5M ops/s with 12 CPUs, whereas my Dell laptop manages around the same with only 4 CPUs + 4 hyperthreads.
This shows that the performance has a lot to do with the individual CPU and likely RAM speed too.
Thoughts
Possible reasons for the worsening performance with more CPUs?
(1) More CPUs means more potential contention in spin locks causing more spins, and therefore less operations per second?
(2) More CPUs means more / slower cache line synchronization between CPUs, and therefore less operations per second?
Disadvantages of these performance tests
Both sets of performance tests do manipulate x key values via y CPUs.
However, the test is not 'apples to apples' in that both tests do not have the same output and functionality.
In the unlocked / shared nothing tests then each CPU can only access its own x/y shard of the total x key values.
Whereas in the spin locked tests then each CPU has access to all x key values.
However, the point of the tests is to discover how much performance is being left on the table using the spin lock IPC method versus a theoretically better IPC mechanism.
For example, is it possible to come up with an alternative IPC mechansim which is faster than the current spin lock mechanism?