Skip to content

Instantly share code, notes, and snippets.

View dshorthouse's full-sized avatar

David Shorthouse dshorthouse

View GitHub Profile
@dshorthouse
dshorthouse / bloodhound.md
Last active February 20, 2020 16:40 — forked from timrobertson100/bloodhound.md
A quick test to explore a bloodhound process

This is a quick test of a modified version of the Bloodhound spark script to check it runs on the GBIF Cloudera cluster (CDH 5.16.2).

From the gateway, grab the file from HDFS (skip HTTP for speed), unzip (15-20 mins) and upload to HDFS:

hdfs dfs -getmerge /occurrence-download/prod-downloads/0002504-181003121212138.zip /mnt/auto/misc/bloodhound/data.zip
unzip /mnt/auto/misc/bloodhound/data.zip -d /mnt/auto/misc/bloodhound/data

hdfs dfs -rm /tmp/verbatim.txt
hdfs dfs -rm /tmp/occurrence.txt
@dshorthouse
dshorthouse / Bloodhound_Lost_Attributions
Created November 23, 2019 17:54
Users whose attributions were lost in Bloodhound due to "over-ingested" specimen records in the GBIF index just prior to November 12, 2019
NULL,"0000-0002-7101-9767","Roderic","Page"
NULL,"0000-0001-9008-0611","Stylianos","Chatzimanolis"
NULL,"0000-0002-6752-9721","Tod","Robbins"
NULL,"0000-0002-7053-8557","Paul","Sokoloff"
NULL,"0000-0001-7618-5230","David Peter","Shorthouse"
NULL,"0000-0003-1366-145X","Timothy","Dickinson"
NULL,"0000-0003-0768-1286","Richard","Pyle"
NULL,"0000-0002-4124-2175","Peter","Hovenkamp"
NULL,"0000-0001-6065-0812","Frank-Thorsten","Krell"
NULL,"0000-0002-1314-755X","Neal","Evenhuis"
910
1984
 DZRJ
-
()
(UB 19881)
(UB 19882)
(UFG 13985)
(UFG 13986)
*
@dshorthouse
dshorthouse / data.csv
Last active March 30, 2023 17:58
Basic R Script to use SimpleMappr API with csv file
species latitude longitude
Pardosa moesta 45.755 -110.12
Pardosa fuscula 47.9 -112
Pardosa moesta 55.6 -101
Pardosa xerampelina 48.9 -103.55
Pardosa xerampelina 43.02 -105.9
Trochosa terricola 45.5 -103.8
Trochosa terricola 46 -100
Trochosa terricola 47.7 -110.9
Pardosa moesta 48 -109
@dshorthouse
dshorthouse / ruby_ocr.rb
Last active April 2, 2025 07:39
OCR Image-based PDF in ruby
require 'parallel'
require 'rtesseract'
require 'mini_magick'
source = "/MyDirectory/my.pdf"
doc = {}
pdf = MiniMagick::Image.open(source)
Parallel.map(pdf.pages.each_with_index, in_threads: 8) do |page, idx|
tmpfile = Tempfile.new(['', '.tif'])
MiniMagick::Tool::Convert.new do |convert|
@dshorthouse
dshorthouse / mapscript-extent.php
Last active March 10, 2017 19:37
PHP MapScript reprojection/extent issues
<?php
//MapServer version 7.0.4
$map = ms_newMapObjFromString("MAP END");
$map->set("units", MS_DD);
$map->setProjection("proj=longlat,ellps=WGS84,datum=WGS84,no_defs", true);
$map->setExtent(-180, -90, 180, 90);
$map->setSize(100,100);
$map->setProjection("proj=robin,lon_0=0,x_0=0,y_0=0,ellps=WGS84,datum=WGS84,units=m,over,no_defs", true);
@dshorthouse
dshorthouse / wkt.html
Last active August 29, 2019 10:32
GMap WKT Drawing
<html>
<head>
<style type="text/css">
#map{width:600px;height:400px;}
#freehand {
margin-left:-5px;
margin-top:5px;
}
#freehand .button{
direction: ltr;
@dshorthouse
dshorthouse / gist:c2bc1896b12e1648e2bc
Created August 7, 2014 03:16
Convert DDMMSS coordinates to array of DD latitude/longitude in PHP
/**
* Split DDMMSS or DD coordinate pair string into an array
*
* @param string $point A string purported to be a coordinate
* @return array(latitude, longitude) in DD
*/
function make_coordinates($point)
{
$loc = preg_replace(array('/[\p{Z}\s]/u', '/[^\d\s,;.\-NSEWO°ºdms\'"]/i'), array(' ', ''), $point);
if (preg_match('/[NSEWO]/', $loc) != 0) {