I was curious about the results reported here, which reports that Scala's mutable maps are slower than Java's: http://www.infoq.com/news/2011/11/yammer-scala
In my tests, Scala's OpenHashMap equals or beats java's HashMap:
Insertion 100k elements (String keys) time in ms:
- scala HashMap: 92.75
- scala OpenHashMap: 14.03125
- java HashMap: 15.78125
Insertion 100k elements (Integer keys) time in ms:
- scala HashMap: 98.625
- scala OpenHashMap: 13.71875
- java HashMap: 23.5625
Hastily thrown together timing code below. I at least run enough trials to get a half second sample.
Scala's mutable HashMap implementation was known to be slower than HashMap, which motivated David MacIver to write OpenHashMap way back in 2008. Unfortunately, not many people know it.
def time(warmup: Int)(block: => Any) = {
for (i <- 1 to warmup) block // possibly trigger JIT on freq used methods in block
var goodSample = false
var start = 0L; var stop = 0L;
var N = 2
while (!goodSample) {
start = System.currentTimeMillis
for (i <- 1 to N) block
stop = System.currentTimeMillis
if (stop - start < 500) N = N*4 // require at least half a second for a decent sample
else goodSample = true
}
val perOp = (stop.toDouble - start.toDouble) / N
perOp
}
val vs = (0 to 100000).map(_ toString).toArray
val warmup = 10
{
println (time (warmup) {
val m = new scala.collection.mutable.HashMap[String,Int];
var i = 0;
while(i<100000) { m.put(vs(i),i);i=i+1;};
})
println (time (warmup) {
val m = new scala.collection.mutable.OpenHashMap[String,Int];
var i = 0;
var start = System.currentTimeMillis();
while(i<100000) { m.put(vs(i),i);i=i+1;};
})
println ( time (warmup) {
val m = new java.util.HashMap[String,Int];
var i = 0;
while(i<100000) { m.put(vs(i),i);i=i+1;};
})
}
// With Integer keys
{
println ("scala HashMap: " + time (warmup) {
val m = new scala.collection.mutable.HashMap[Int,Int];
var i = 0;
while(i<100000) { m.put(i,i);i=i+1;};
})
println ("scala OpenHashMap: " + time (warmup) {
val m = new scala.collection.mutable.OpenHashMap[Int,Int];
var i = 0;
var start = System.currentTimeMillis();
while(i<100000) { m.put(i,i);i=i+1;};
})
println ("java HashMap: " + time (warmup) {
val m = new java.util.HashMap[Int,Int];
var i = 0;
while(i<100000) { m.put(i,i);i=i+1;};
})
}
This might not be the most scientific benchmark because of potential GCs and hash collisions at low range. I tried increase the number from 100k to 1m, and then Java HashMap was faster. I also tested my own implementation of OpenHashMap that uses a bitmap for nulls and specialized the get/put.
The OpenHashMap implementation for Spark can be found at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
My test code is at https://gist.github.com/rxin/44657c3f40d4e4294857