Skip to content

Instantly share code, notes, and snippets.

@avnerbarr
avnerbarr / murmur64.scala
Created January 23, 2018 08:10
64bit "murmur" hash in scala. The builtin murmur hash only outputs 32 bits so if hashing millions of items you will get quite a few conflicts. This method will allow hashing millions of items without any conflicts
// Simple function to generate 64 bit hashes using the builtin Murmur hash functionaliy which only outputs 32bits
// we had cases when hashing 15M+ items where we got around 0.08% conflicts
// using this method we were able to hash 15M items with 0 conflicts
object ExtendedMurmurHash {
val seed = 0xf7ca7fd2
def hash(u: String): Long = {
val a = scala.util.hashing.MurmurHash3.stringHash(u, seed)
val b = scala.util.hashing.MurmurHash3.stringHash(u.reverse.toString, seed)
// shift a 32 bits to the left, leaving 32 lower bits zeroed
@avnerbarr
avnerbarr / sbtpretty.rb
Last active December 19, 2017 08:02
hacky script to make a pretty printer for sbt test
#!/usr/bin/env ruby
require 'colorize'
# pre-requisites
# gem install colorize
# if you are running in mac:
# brew install gnu-sed
# usage
# sbt test 2> /dev/null | gsed -r "s/[[:cntrl:]]\[[0-9]{1,3}m//g" | sbtpretty
brew cask install osxfuse
brew install s3fs
mkdir -p ~/.s3/folder_to_mount_bucket_to
chmod 600 ~/.s3/folder_to_mount_bucket_to
# create a token in the AWS console https://console.aws.amazon.com/iam/home?#home
# replace the following with your values
echo MYIDENTITY:MYCREDENTIAL >~/.s3/passwd
sku name price categories group_id image_url in_stock post_type publish_time title updated url
sku-0-43-vNIC1LCuEz name-0-62-gAAVKQtgf8 660 categories-0-81-W5sOoHPf0L group-88 image_url-0-69-xC7bC3ha8R false post_type-0-12-naolIIn9eV publish_time-0-94-96V0X6YOh4 title-0-25-tkts7pQVTJ updated-0-96-drGU3I9UUs url-0-2-K4LkKt8C6v
sku-1-11-Ul0VNJd410 name-1-78-vGNGUmbCF8 318 categories-1-66-QRj0uPNOjk group-36 image_url-1-32-L8PNZdj1EB false post_type-1-21-y7VZACkuTB a|publish_time-1-35-4iIIxohmcn title-1-59-00jyJ73YYj updated-1-20-UIPS6NvZqF url-1-45-3n2JbDSp6k
sku-2-77-7JVkDeTehw name-2-8-qYQY2zObgo 307 categories-2-90-IByYa6cuit group-4 image_url-2-79-XtFBOP6V55 false post_type-2-86-91YKqwn7rW publish_time-2-7-5mD8rFf9vV title-2-49-ri7CYPqMMV updated-2-78-75AoMNPxYC url-2-9-y8FOB5chtW
sku-3-18-C0WGfuMaZX name-3-20-Vf9XYc11j9 654 categories-3-6-ewWS50h6Zu group-15 image_url-3-6-yypD0DCJXT true post_type-3-35-ZXhQ4fhHfa publish_time-3-77-EjVNoWFd3T title-3-80-jlECnL44g1 updated-3-48-cJOkhTkaxf url-3-80-h4BYKDzhp5
We can make this file beautiful and searchable if this error is corrected: It looks like row 4 should actually have 14 columns, instead of 11 in line 3.
sku,name,price,categories,categories:blah,group_id,image_url,in_stock,post_type,price:blah,publish_time:blah:1:blah:foo:ha,title,updated,url
sku-0-0-WzcsbzXTPT,name-0-27-uYTzhpRQQ4,27,categories-0-45-FQ6lor0BQg,categories:blah-0-4-khfIt1YYSB,group_id-0-20-bRYrsnfEgc,image_url-0-75-Fig8wrCgQ2,true,post_type-0-76-XGYMGC0iUi,price:blah-0-34-mRa4Q7aHpU,publish_time:blah:1:blah:foo:ha-0-30-IStPiThVuM,title-0-11-mh2QnrGq7q,updated-0-51-dDpifJcPhD,url-0-27-U8viCIdzCU
sku-1-19-GLYF2eT0NN,name-1-31-w7LVpIEw8P,3,categories-1-52-S0kkT5EpGJ,categories:blah-1-75-xfrA8x73Dn,group_id-1-47-qwVp9DnbZi,image_url-1-45-NZ57S2JuU0,false,post_type-1-35-CDs58kbjS2,price:blah-1-58-I26eAB4v4w,publish_time:blah:1:blah:foo:ha-1-70-PGHqmX3Mh9,title-1-59-xwHj05WuyK,updated-1-17-x19VS0wJfr,url-1-55-ZCeKW4K5ze
sku-2-38-3dbCesQ7JH,name-2-49-r1vfp9LWe8,516,categories-2-38-ZgFV2GYDTL,categories:blah-2-99-bIrMcCu8iy,group_id-2-69-dL7B3xULUo,image_url-2-46-Re4L8KmD1s,true,post_type-2-94-J38trhpXOQ,price:blah-2-39-ForxuLBUTL,publish_time:blah:1:
We can make this file beautiful and searchable if this error is corrected: It looks like row 4 should actually have 11 columns, instead of 10 in line 3.
sku,categories:blah,group_id,image_url,in_stock,post_type,price:blah,publish_time:blah:1:blah:foo:ha,title,updated,url
sku-0-50-rpDxvXLYji,categories:blah-0-33-yIrYX6Rosb-0-19,group_id-0-59-y1lEevwHLm-0-93,image_url-0-31-kxVMZKapB7-0-29,in_stock-0-59-H4uoKlTZLP-0-70,post_type-0-3-AmDM9GWQ0W-0-27,price:blah-0-34-ypVStJYQb2-0-97,publish_time:blah:1:blah:foo:ha-0-69-v5LzZBC2ue-0-49,title-0-84-Z1BSyxb2Uz-0-84,updated-0-25-D8AWawf3G4-0-27,url-0-87-NQIqIrZBAs-0-23
sku-1-63-Ki3xOqPRz2,categories:blah-1-68-ZIcKbiq4lE,group_id-1-94-Nbe7fFIpED,image_url-1-75-a7kXRPSwMy,in_stock-1-36-emmhFD3mXh,post_type-1-61-EVbMu5qOzZ,price:blah-1-44-nJeq9Hrg2u,publish_time:blah:1:blah:foo:ha-1-1-fJqXe8oJPk,title-1-36-LpeEaZ3Xoi,updated-1-68-aqhLSOyFkZ,url-1-38-ZHmAhpSIEp
sku-2-30-e5Io7EvVlb,categories:blah-2-36-Es1pnn1F3g,group_id-2-82-HtACnC9gLu,image_url-2-34-ExVRkzNkNe,in_stock-2-4-0YcEynHEzn,post_type-2-61-1DP1A7J0So,price:blah-2-23-lSW3SL1QVp,publish_time:blah:1:blah:foo:ha-2-39-D0hdKOmf6b,title-2-57-hv1tiQXgQO,updated-2-28-Z2Li
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 11 columns, instead of 2 in line 4.
sku,categories:blah,group_id,image_url,in_stock,post_type,price:blah,publish_time:blah:1:blah:foo:ha,title,updated,url
sku-0-50-rpDxvXLYji,categories:blah-0-33-yIrYX6Rosb,group_id-0-59-y1lEevwHLm,image_url-0-31-kxVMZKapB7,in_stock-0-59-H4uoKlTZLP,post_type-0-3-AmDM9GWQ0W,price:blah-0-34-ypVStJYQb2,publish_time:blah:1:blah:foo:ha-0-69-v5LzZBC2ue,title-0-84-Z1BSyxb2Uz,updated-0-25-D8AWawf3G4,url-0-87-NQIqIrZBAs
sku-1-63-Ki3xOqPRz2,categories:blah-1-68-ZIcKbiq4lE,group_id-1-94-Nbe7fFIpED,image_url-1-75-a7kXRPSwMy,in_stock-1-36-emmhFD3mXh,post_type-1-61-EVbMu5qOzZ,price:blah-1-44-nJeq9Hrg2u,publish_time:blah:1:blah:foo:ha-1-1-fJqXe8oJPk,title-1-36-LpeEaZ3Xoi,updated-1-68-aqhLSOyFkZ,url-1-38-ZHmAhpSIEp
sku-2-30-e5Io7EvVlb,categories:blah-2-36-Es1pnn1F3g,group_id-2-82-HtACnC9gLu,image_url-2-34-ExVRkzNkNe,in_stock-2-4-0YcEynHEzn,post_type-2-61-1DP1A7J0So,price:blah-2-23-lSW3SL1QVp,publish_time:blah:1:blah:foo:ha-2-39-D0hdKOmf6b,title-2-57-hv1tiQXgQO,updated-2-28-Z2Lio1bzBD,url-2-58-NEOcGhSJ5O
sku-3-6-3fIXxQcibO,cate
@avnerbarr
avnerbarr / learningScala.md
Last active October 8, 2017 10:33
Learning Scala
@avnerbarr
avnerbarr / toolexists
Created November 20, 2016 11:35
Checks if a tool exists
set -e
if ! which <your tool> > /dev/null; then
echo "error: <your tool> is missing"
exit 1
fi
@avnerbarr
avnerbarr / chisel.sh
Created February 9, 2016 14:39
Install chisel
#!/bin/sh
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew update
brew install chisel