Skip to content

Instantly share code, notes, and snippets.

@jhpoelen
jhpoelen / create-checklist-cluster.sh
Last active August 5, 2019 20:01
idigbio-spark scripts
#!/bin/bash
#
#
WKT_STRING="POLYGON ((-72.77293810620904 -33.196074154826235, -72.77293810620904 6.59516197881252, -28.12450060620904 6.59516197881252, -28.12450060620904 -33.196074154826235, -72.77293810620904 -33.196074154826235))"
spark-submit \
--master mesos://zk://mesos01:2181,mesos02:2181,mesos03:2181/mesos \
--driver-memory 4G \
--conf spark.sql.caseSensitive=true \
@jhpoelen
jhpoelen / downloadIds.txt
Last active May 15, 2019 23:24
Retrieve Occurrence Downloads Cited in Literature
0000036-150827100048397
0000037-150827100048397
0000039-150827100048397
0000040-150827100048397
0000048-150306150734599
0000061-150827100048397
0000062-150827100048397
0000067-150827100048397
0000068-150827100048397
0000069-150827100048397
@jhpoelen
jhpoelen / calculateKingdomToKingdomInteractions.scala
Last active April 5, 2019 20:29
IVMOOC 2019 GloBI Kingdom To Kingdom Interactions
val taxa = spark.read.option("delimiter","""\t""").option("header","true").csv("taxonCache.tsv.bz2")
taxa.printSchema
import spark.implicits._
val taxonCache = spark.read.option("delimiter","""\t""").option("header","true").csv("taxonCache.tsv.bz2")
val taxonIdsPaths = taxonCache.select("id", "pathNames", "path").as[(String, String, String)].filter(_._2 != null).filter( _._3 != null).filter(_._1 != null)
val taxaIdToKingdom = taxonIdsPaths.map( r=> (r._1, r._2.split("\\|").map(_.trim), r._3.split("\\|").map(_.trim))).map(r => (r._1, r._2.zip(r._3))).map(r => (r._1, r._2.filter(_._1 == "kingdom").map(_._2).mkString)).filter(_._2.nonEmpty).filter(r => List("GBIF", "ITIS","WORMS", "INAT_TAXON").contains(r._1.split(":").head)).filter(_._2 != "incertae sedis")
taxaIdToKingdom.write.option("delimiter","""\t""").csv("taxaIdToKingdom.tsv")
@jhpoelen
jhpoelen / resp.json
Last active November 14, 2018 01:07
bash script for uploading to Zenodo
{
"conceptrecid": "1486278",
"created": "2018-11-14T00:29:34.856766+00:00",
"files": [],
"id": 1486279,
"links": {
"bucket": "https://zenodo.org/api/files/35cfca90-d31f-4b36-b91a-8def579ca410",
"discard": "https://zenodo.org/api/deposit/depositions/1486279/actions/discard",
"edit": "https://zenodo.org/api/deposit/depositions/1486279/actions/edit",
"files": "https://zenodo.org/api/deposit/depositions/1486279/files",
@jhpoelen
jhpoelen / register_hashes_with_hash_archive.sh
Last active September 18, 2018 16:09
register preston hashes and urls with hash archive
#!/bin/bash
# Register all preston urls with hash-archive.org
#
# Please replace "deeplinker\.bio" instances below with you own escaped hostname of your Preston instance.
# see https://preston.guoda.bio on how to install preston
#
preston ls -l tsv | grep Version | cut -f1,3 | tr '\t' '\n' | grep -v "deeplinker\.bio/\.well-known/genid" | sort | uniq | sed -e 's/hash:\/\/sha256/https:\/\/deeplinker.bio/g' | sed -e 's/^/https:\/\/hash-archive.org\/api\/enqueue\//g' | xargs -L1 curl
interactions.tsv.gz
Contains pairwise interactions generated by elton 0.5.0 on 2018-06-29 .
Generated by dietmatrix.sh .
fbPredPreyOrder.tsv.gz
Contains prey/diet items of species known to fishbase. For prey/diet items, the linked order(s) are included as well as the resolved prey/diet item terms. Calculated by dietmatrix.sh .
majorityOrders.tsv
Majority order were calculate by selecting the most frequently occurring order associated with a specific prey id/name.
If different order assignments for a particular prey item have same frequency, the orders are sorted in alphabetical order and the first is selected.
@jhpoelen
jhpoelen / exportNanoPubs.sh
Last active December 15, 2017 23:06
Elton nanopubs
#!/bin/bash
#
# Example of how to create trustry nanopubs from species interaction data using elton and nanopub-java
#
echo download elton tool...
curl -L "https://github.com/globalbioticinteractions/elton/releases/download/0.4.1/elton.jar" > elton.jar
echo download elton tool done.
# you can also use https://github.com/globalbioticinteractions/elton-archive to retrieve archived datasets from the internet archive
@jhpoelen
jhpoelen / dietMatrix.R
Created June 1, 2017 17:15
Diet matrix using GloBI archives and R
# add code
@jhpoelen
jhpoelen / .gitignore
Last active June 6, 2017 19:47
Building Diet Matrix From GloBI interaction.tsv and taxonCache.tsv
.idea
@jhpoelen
jhpoelen / gist:39d866721bb35a63d0e9b99073c5e8b2
Last active February 4, 2017 04:37
lookup GBIF occurrence counts for species name
#install.packages('rgbif')
fresh <- read.csv('Fresh.species.csv')
#fresh <- data.frame(predator.taxon.name = c('Arius felis', 'Ariopsis felis', 'Gadus morhua'))
# appends columns gbifSpeciesKey, gbifOccCount when gbif knows about species and has occurrences
appendOccCount <- function(df) {
names <- df$predator.taxon.name