Skip to content

Instantly share code, notes, and snippets.

@mdoering
mdoering / backbone-2021.md
Last active March 11, 2021 14:23
backbone-2021

In this 2021-03-03 edition of the GBIF Backbone we have adressed various issues both in content but also in development work that lead to a better overall taxonomy. Further details can be seen in the closed Github project board. Here we want to point out some major achievements:

Data source changes

We have advanced the Backbone Building so that we can block certain groups or subtree of a source by configuration only. This new feature has been used to block the prokaryotic kingdoms Bacteria and Archaea as well as the Fabaceae plant family from Catalogue of Life.

Instead we decided to follow Kews World Checklist of Vascular Plants for Fabaceae, which we made available in Darwin Core to index into GBIF.

@mdoering
mdoering / idmap.tsv
Created December 2, 2020 09:49
COL ID mapping file example
c526234e655cec65618f58161d24932a 5QQJ9 Picea abies var. abies
b1029cb6b79338209e3d60f5974d33fd 8KA3 Abies alpestris Brügger.
4251802078a0d844bda6c62ae5a145aa 8K9W Abies abies (L.) Rusby
b9f99a1f126b6b9d3d8c560386dea188 4HPZF Picea abies (L.) H. Karst.
12dca9c49741815f82400bb7bff50553 8K9Y Abies alba Mill.
b385e5b1bd137efbb9a5387821127481 EKB Pinaceae
@mdoering
mdoering / birds.json
Created November 10, 2020 13:21
Aves COL export as simple JSON
[
{"id":"5E7FB","name":"Aves","rank":"class","code":"zoological","status":"accepted","children":[
{"id":"5CY9X","name":"Accipitriformes","rank":"order","code":"zoological","status":"accepted","children":[
{"id":"5D73B","name":"Accipitridae","authorship":"Vigors, 1824","rank":"family","code":"zoological","status":"accepted","children":[
{"id":"5REGP","name":"Accipitrinae","rank":"subfamily","code":"zoological","status":"synonym","children":[]},
{"id":"5DXFV","name":"Accipiter","authorship":"Brisson, 1760","rank":"genus","code":"zoological","status":"accepted","children":[
{"id":"5E9DM","name":"Accipiter albogularis","authorship":"G. R. Gray, 1870","rank":"species","code":"zoological","status":"accepted","children":[
{"id":"5ELB7","name":"Accipiter albogularis albogularis","authorship":"G. R. Gray, 1870","rank":"subspecies","code":"zoological","status":"accepted","children":[]},
{"id":"5ELB6","name":"Accipiter albogularis eichhorni","authorship":"Har
@mdoering
mdoering / citations.txt
Last active November 5, 2020 03:48
col11.20 metadata
COL RELEASE
2230: Roskov Y., Ower G., Orrell T., Nicolson D., Bailly N., Kirk P. M., Bourgoin T., DeWalt R. E., Decock W., van Nieukerken E. J., Penev L. (eds.) (2020). Species 2000 & ITIS Catalogue of Life, 3rd November 2020. Digital resource at www.catalogueoflife.org. Species 2000: Naturalis, Leiden, the Netherlands. ISSN 2405-8858.
SOURCES:
1005: Oosterbroek P. (2020). CCW: Catalogue of Craneflies of the World (version Jul 2013). In: Roskov Y., Ower G., Orrell T., Nicolson D., Bailly N., Kirk P. M., Bourgoin T., DeWalt R. E., Decock W., van Nieukerken E. J., Penev L. (eds.) (2020). Species 2000 & ITIS Catalogue of Life, 3rd November 2020. Digital resource at www.catalogueoflife.org. Species 2000: Naturalis, Leiden, the Netherlands. ISSN 2405-8858.
1006: Vignes-Lebbe R., Gallut C. (2020). CIPA: Computer Aided Identification of Phlebotomine sandflies of Americas (version 3, Mar 2011). In: Roskov Y., Ower G., Orrell T., Nicolson D., Bailly N., Kirk P. M., Bourgoin T., DeWalt R. E., Decock W., van Nieukerken
@mdoering
mdoering / ncbi.tsv
Last active August 17, 2020 10:43
NCBI Test DwC-A
taxonID parentID taxonRank scientificName scientificNameAuthorship
root root
cell root Cellular organisms
bact cell SUPERKINGDOM Bacteria
acidobact bact phylum Acidobacteria
acido acidobact class Acidobacteriia
bryobact acido order Bryobacterales
1 bryobact genus Genus 1
2 bryobact genus Genus 2
3 bryobact genus Genus 3
@mdoering
mdoering / hierarchy.tree
Last active March 3, 2020 20:21
CoL classification hierarchy down to families as text trees. Changes since AC19 Raw
Animalia [kingdom]
Acanthocephala [phylum]
Archiacanthocephala [class]
Apororhynchida [order]
Apororhynchidae [family]
Gigantorhynchida [order]
Giganthorhynchidae [family]
Moniliformida [order]
Moniliformidae [family]
Oligacanthorhynchida [order]
@mdoering
mdoering / int-sizes.txt
Created November 15, 2019 12:02
Disk use of int, smallint and enum (10 values) in Postgres11 on OSX and Linux
MAC
table_name | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes | total | index | toast | table
-------------+--------------+-------------+-------------+-------------+-------------+--------+---------+------------+--------
test-int4 | 5e+06 | 323534848 | 142221312 | 8192 | 181305344 | 309 MB | 136 MB | 8192 bytes | 173 MB
test-int2 | 5e+06 | 323993600 | 142680064 | 8192 | 181305344 | 309 MB | 136 MB | 8192 bytes | 173 MB
test-enum | 5e+06 | 293650432 | 112336896 | 8192 | 181305344 | 280 MB | 107 MB | 8192 bytes | 173 MB
test-int4[] | 4.99999e+06 | 306806784 | 5505024 | 8192 | 301293568 | 293 MB | 5376 kB | 8192 bytes | 287 MB
test-int2[] | 4.99999e+06 | 266493952 | 5505024 | 8192 | 260980736 | 254 MB | 5376 kB | 8192 bytes | 249 MB
test-enum[] | 4.99998e+06 | 346955776 | 5505024 | 8192 | 341442560 | 331 MB | 5376 kB | 8192 bytes | 326 MB
@mdoering
mdoering / GENUS_HOMONYMS.sql
Created September 18, 2019 10:55
Backbon Homonym SQL snippets
-- ALL GENUS HOMONYMS
WITH homs AS (
SELECT u.rank, n.canonical_name
FROM name_usage u JOIN name n ON u.name_fk = n.id
WHERE u.dataset_key = nubkey() AND u.deleted IS NULL AND NOT u.is_synonym AND u.rank='GENUS'::rank
GROUP BY u.rank, n.canonical_name
HAVING count(*) > 1
)
SELECT u.id, n.canonical_name, u.rank, n.scientific_name, u.is_synonym, u.status,
u.kingdom_fk, u.phylum_fk, u.class_fk, u.order_fk, u.family_fk, u.genus_fk
@mdoering
mdoering / doubtful-genus-homonyms.txt
Last active September 18, 2019 12:07
nub genus homonyms
-- GENUS HOMONYMS WITH THE SAME CLASSIFICATION, ALL DOUBTFUL AND AT LEAST ONE MISSING AUTHORSHIP
-- BUT RETURNING ALL USAGES WITHIN THAT KINGDOM WITH THAT NAME REGARDLESS THEIR CLASSIFICATION
WITH homs AS (
SELECT u.rank, n.canonical_name, u.kingdom_fk, u.phylum_fk, u.class_fk, u.order_fk, u.family_fk
FROM name_usage u JOIN name n ON u.name_fk = n.id
WHERE u.dataset_key = nubkey() AND u.deleted IS NULL AND NOT u.is_synonym AND u.rank='GENUS'::rank
GROUP BY u.rank, n.canonical_name, u.kingdom_fk, u.phylum_fk, u.class_fk, u.order_fk, u.family_fk
HAVING count(*) > 1 AND count(distinct status)=1 AND bool_or(n.canonical_name=n.scientific_name)
)
@mdoering
mdoering / parents.json
Created September 3, 2019 10:29
col tree api parents with placeholders
// 20190903122905
// http://localhost:8080/dataset/3/tree/f55812e8-5422-402e-b071-b67a9cdf481f--incertae-sedis--FAMILY
[
{
"datasetKey": 3,
"id": "f55812e8-5422-402e-b071-b67a9cdf481f",
"parentId": "0",
"name": "<i>Viruses</i>",
"rank": "kingdom",