#Scala .hashCode vs. MurmurHash3 for Spark's MLlib
This is simple test of two hashing functions:
- Scala's native implementation (
obj.##
), used inHashingTF
- MurmurHash3, included in Scala, used by Vowpal Wabbit and many others
The test uses the aspell dictionary generated with the "insane" setting (download), which produces 676,547 entries, and explores the following grid:
- Feature vector sizes: 2^18..22