Skip to content

Instantly share code, notes, and snippets.

@rdmpage
Last active March 31, 2018 05:03
Show Gist options
  • Save rdmpage/a79bfc9327e25fbf8494 to your computer and use it in GitHub Desktop.
Save rdmpage/a79bfc9327e25fbf8494 to your computer and use it in GitHub Desktop.
GenBank and GBIF

Cluster multiple occurrence records in GBIF

In this example we have the same sequence GQ247641 in two sequence datasets ("European Molecular Biology Laboratory Australian Mirror" and "Geographically tagged INSDC sequences"), and also the voucher specimen ("AM W.35546.001" or "AMS:W.35546") also occurs in GBIF (provided by "Australian Museum provider for OZCAM"). Linking the two sequence occurrences is trivial, we just link by the accession "GQ247641". Linking the sequence to the museum specimen requires matching the slightly different strings "AM W.35546.001" and "AMS:W.35546".

The graph links three records in GBIf that all refer to the same thing.

CREATE (occurrence487122480:Occurrence { name: "gbif487122480", catalogNumber: "AM W.35546.001" }),
(sample1:Sample { name: "AM W.35546.001" }),
(occurrence487122480)-[:HASCODE]->(sample1),
(occurrence488829630:Occurrence { name: "gbif488829630", catalogNumber: "GQ247641" }),
(GQ247641:Sequence { accession:"GQ247641", catalogueNumber: "AMS:W.35546" }),
(GQ247641)-[:SAMEAS]->(occurrence488829630),
(occurrence1006303667:Occurrence { name: "gbif1006303667", catalogNumber: "GQ247641" }),
(GQ247641)-[:SAMEAS]->(occurrence1006303667),

(dataset1:Dataset { name: "European Molecular Biology Laboratory Australian Mirror"})<-[:SOURCE]-(occurrence488829630),
(dataset2:Dataset { name: "Geographically tagged INSDC sequences"})<-[:SOURCE]-(occurrence1006303667),
(dataset3:Dataset { name: "Australian Museum provider for OZCAM"})<-[:SOURCE]-(occurrence487122480),


(GQ247641)-[:HASCODE]->(sample2:Sample { name: "AMS:W.35546" }),
(occurrence487122480)-[:HASCODE]->(sample2)
MATCH (o1:Occurrence)-[:HASCODE]-(Sample)-[:HASCODE]-(Sequence)-[:SAMEAS]-(o2:Occurrence)
WITH o1, collect(o2.name) AS os
RETURN o1.name , os
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment