Skip to content

Instantly share code, notes, and snippets.

@coderfi
Created May 10, 2018 00:45
Show Gist options
  • Select an option

  • Save coderfi/7e6c70e70187b225ec3a740e0aeff672 to your computer and use it in GitHub Desktop.

Select an option

Save coderfi/7e6c70e70187b225ec3a740e0aeff672 to your computer and use it in GitHub Desktop.
pyspark 2.3.0 Local Sensitivity Hashing
https://databricks.com/session/locality-sensitive-hashing-by-spark
https://databricks.com/blog/2017/05/09/detecting-abuse-scale-locality-sensitive-hashing-uber-engineering.html
https://github.com/apache/spark/blob/v2.3.0/examples/src/main/python/ml/min_hash_lsh_example.py
pip install -u pyspark
git clone https://github.com/apache/spark
cd spark
git checkout v2.3.0
spark-submit --master='local[*]' --driver-memory=4g examples/src/main/python/ml/min_hash_lsh_example.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment