Skip to content

Instantly share code, notes, and snippets.

@robinkraft
Created November 9, 2012 21:25
Show Gist options
  • Select an option

  • Save robinkraft/4048363 to your computer and use it in GitHub Desktop.

Select an option

Save robinkraft/4048363 to your computer and use it in GitHub Desktop.
recipe for running fossa on EMR
# launch 10-instance cluster - $6-7/hr w/spot
lein emr -s 10 -t high-memory -b 0.75 -bs bsaconfig.xml
# login to cluster
ssh -i ~/.ssh/MoL-hosts.pem hadoop@<insert public DNS>
# get lein
cd bin
wget https://raw.github.com/technomancy/leiningen/stable/bin/lein
mv lein lein1
wget https://raw.github.com/technomancy/leiningen/preview/bin/lein
chmod u+x lein lein1
# Bootstrap lein
lein
cd ..
# Get forma repo - easier to edit, test code
git clone [email protected]:MapofLife/fossa.git
cd fossa
git checkout develop
lein do deps, compile :all, uberjar
# add this to conf/hadoop-site.xml
<property><name>mapred.child.java.opts</name><value>-Djava.library.path=/home/hadoop/native -Xms1024m -Xmx1048m</value></property>
# launch repl
screen -Lm hadoop jar /home/hadoop/fossa/target/fossa-0.1.0-SNAPSHOT-standalone.jar clojure.main
# from the repl
(use 'fossa.core)
(in-ns 'fossa.core)
(?- (hfs-textline "s3n://gbifsource/insert-stmts")
(parse-occurrence-data :path "s3n://gbifsource/occurrence-text"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment