Last active
September 25, 2015 08:18
-
-
Save MattBlissett/89f68394c8a4b1c9374e to your computer and use it in GitHub Desktop.
Graph Gist of Taxonomic Name Units
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
= Generating Global Checklists from Taxonomic Name Units | |
:neo4j-version: 2.2.0 | |
:author: Matthew Blissett, Donald Hobern | |
:description: Example Taxonomic Name Units that result from subsequent taxonomic acts | |
:tags: domain:life-science, use-case:taxonomy | |
== Notation | |
* Single letters represent Linnaean names at various ranks | |
* Capital letters mean that the name is accepted within a treatment as the name for a taxon | |
* Lowercase letters mean that the name is considered a synonym | |
* An accepted name may be followed by its lower case synonyms as a “word” | |
* The same letter, regardless of case, always represents the same Linnaean name wherever it is used | |
* Pn (_n_ is an integer) represents an authority (publicationy) with nomenclatural acts and/or taxonomic treatments | |
In other words _A_, _Bc_, _Def_ represents the following: | |
* Species _A_ | |
* Species _B_ with synonym _C_ | |
* Species _D_ with synonyms _E_ and _F_ | |
== Problem | |
Imagine that the entire world taxonomic literature had never proceeded beyond the following | |
(each bullet represents one publication, in chronological order): | |
* P1 – _A, B, C, D, E_ | |
* P2 – _Cd, E, F, G_ | |
* P3 – _Af, Ceg, D, H, I_ | |
* P4 – _Befg, I, J, K_ | |
* P5 – _A, B, Cd, E, Gj, H, I, K_ | |
What possible global checklists of names might exist? | |
(The initial set-up of data is hidden, reveal it with the `+`.) | |
//setup | |
//hide | |
[source,cypher] | |
---- | |
// Create names (nodes) from P1 | |
CREATE (`A`:name {name:'A',author:'P1'}), | |
(`B`:name {name:'B',author:'P1'}), | |
(`C`:name {name:'C',author:'P1'}), | |
(`D`:name {name:'D',author:'P1'}), | |
(`E`:name {name:'E',author:'P1'}) | |
// Create accepted-name relationships from P1 | |
CREATE (`A`)-[:ACCEPTED_AS {author:'P1'}]->(`A`), | |
(`B`)-[:ACCEPTED_AS {author:'P1'}]->(`B`), | |
(`C`)-[:ACCEPTED_AS {author:'P1'}]->(`C`), | |
(`D`)-[:ACCEPTED_AS {author:'P1'}]->(`D`), | |
(`E`)-[:ACCEPTED_AS {author:'P1'}]->(`E`) | |
// Create names (nodes) from P2 | |
CREATE (`F`:name {name:'F',author:'P2'}), | |
(`G`:name {name:'G',author:'P2'}) | |
// Create accepted-name relationships from P2 | |
CREATE (`C`)-[:ACCEPTED_AS {author:'P2'}]->(`C`), | |
(`D`)-[:ACCEPTED_AS {author:'P2'}]->(`C`), | |
(`E`)-[:ACCEPTED_AS {author:'P2'}]->(`E`), | |
(`F`)-[:ACCEPTED_AS {author:'P2'}]->(`F`), | |
(`G`)-[:ACCEPTED_AS {author:'P2'}]->(`G`) | |
// Create names (nodes) from P3 | |
CREATE (`H`:name {name:'H',author:'P3'}), | |
(`I`:name {name:'I',author:'P3'}) | |
// Create accepted-name relationships from P3 | |
CREATE (`A`)-[:ACCEPTED_AS {author:'P3'}]->(`A`), | |
(`F`)-[:ACCEPTED_AS {author:'P3'}]->(`A`), | |
(`C`)-[:ACCEPTED_AS {author:'P3'}]->(`C`), | |
(`E`)-[:ACCEPTED_AS {author:'P3'}]->(`C`), | |
(`G`)-[:ACCEPTED_AS {author:'P3'}]->(`C`), | |
(`D`)-[:ACCEPTED_AS {author:'P3'}]->(`D`), | |
(`H`)-[:ACCEPTED_AS {author:'P3'}]->(`H`), | |
(`I`)-[:ACCEPTED_AS {author:'P3'}]->(`I`) | |
// Create names (nodes) from P4 | |
CREATE (`J`:name {name:'J',author:'P4'}), | |
(`K`:name {name:'K',author:'P4'}) | |
// Create accepted-name relationships from P4 | |
CREATE (`B`)-[:ACCEPTED_AS {author:'P4'}]->(`B`), | |
(`E`)-[:ACCEPTED_AS {author:'P4'}]->(`B`), | |
(`F`)-[:ACCEPTED_AS {author:'P4'}]->(`B`), | |
(`G`)-[:ACCEPTED_AS {author:'P4'}]->(`B`), | |
(`I`)-[:ACCEPTED_AS {author:'P4'}]->(`I`), | |
(`J`)-[:ACCEPTED_AS {author:'P4'}]->(`J`), | |
(`K`)-[:ACCEPTED_AS {author:'P4'}]->(`K`) | |
// No new names (nodes) from P5 | |
// Create accepted-name relationships from P5 | |
CREATE (`A`)-[:ACCEPTED_AS {author:'P5'}]->(`A`), | |
(`B`)-[:ACCEPTED_AS {author:'P5'}]->(`B`), | |
(`C`)-[:ACCEPTED_AS {author:'P5'}]->(`C`), | |
(`D`)-[:ACCEPTED_AS {author:'P5'}]->(`C`), | |
(`E`)-[:ACCEPTED_AS {author:'P5'}]->(`E`), | |
(`G`)-[:ACCEPTED_AS {author:'P5'}]->(`G`), | |
(`J`)-[:ACCEPTED_AS {author:'P5'}]->(`G`), | |
(`H`)-[:ACCEPTED_AS {author:'P5'}]->(`H`), | |
(`I`)-[:ACCEPTED_AS {author:'P5'}]->(`I`), | |
(`K`)-[:ACCEPTED_AS {author:'P5'}]->(`K`) | |
RETURN * | |
---- | |
_Unfortunately, multiple relationships between the same nodes are not visible._ | |
//graph_result | |
== Results | |
=== How many authorities are there? | |
[source,cypher] | |
---- | |
MATCH (x:name)-[a:ACCEPTED_AS]->(y:name) | |
RETURN COUNT(DISTINCT a.author) AS Count_of_Authorities | |
---- | |
//table | |
== How many names are there? | |
[source,cypher] | |
---- | |
MATCH (x:name) | |
RETURN COUNT(x) AS Count_of_Names | |
---- | |
//table | |
== How many _taxonomic name units_ (TNUs) are there? | |
[source,cypher] | |
---- | |
MATCH (x:name)-[a:ACCEPTED_AS]->(y:name) | |
RETURN COUNT(a) AS Count_of_TNUs | |
---- | |
//table | |
== How many taxon concepts _sensu_ authority? | |
[source,cypher] | |
---- | |
MATCH (x:name)-[a:ACCEPTED_AS]->(x:name) | |
RETURN COUNT(a) AS Count_of_Taxon_Concepts_sensu_Authority | |
---- | |
//table | |
== How many taxon concepts by synonymy | |
Explanation of query: Start by finding accepted taxa, and all synonyms (if any). Group those synonyms into “taxon concepts” according to authority, then group authorities by distinct sets of taxon concepts. | |
[source,cypher] | |
---- | |
MATCH (acc:name)-[accrel:ACCEPTED_AS]->(acc:name) | |
OPTIONAL MATCH (syn:name)-[synrel:ACCEPTED_AS]->(acc:name) | |
WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, synrel.author AS author, acc.name AS Accepted_name | |
RETURN Taxon_concept, COLLECT(author) AS Authorities, Accepted_name | |
ORDER BY Accepted_name, LENGTH(Taxon_concept) | |
---- | |
//table | |
== What are our possible “catalogues of life”? | |
Assuming we rely heavily on P5, what are the possible complete taxonomies? | |
[source,cypher] | |
---- | |
MATCH (syn:name)-[synrel:ACCEPTED_AS { author:"P5" }]->(acc:name) | |
WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, acc.name AS Accepted_name | |
MATCH (syn2:name)-[synrel2:ACCEPTED_AS { author:"P5" }]->(acc2:name) | |
WITH COLLECT(DISTINCT syn2) AS allNames, COLLECT(DISTINCT Taxon_concept) AS tc, COLLECT(Accepted_name) AS an | |
MATCH (x:name)-[r]->(y:name) | |
WHERE NONE (a IN allNames | |
WHERE x = a) | |
RETURN DISTINCT x.name AS Unplaced_name, y.name AS is_a_synonym_of, r.author AS according_to, tc AS Taxa_according_to_P5 | |
---- | |
//table | |
What if, instead, we start with P4? | |
[source,cypher] | |
---- | |
MATCH (syn:name)-[synrel:ACCEPTED_AS { author:"P4" }]->(acc:name) | |
WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, acc.name AS Accepted_name | |
MATCH (syn2:name)-[synrel2:ACCEPTED_AS { author:"P4" }]->(acc2:name) | |
WITH COLLECT(DISTINCT syn2) AS allNames, COLLECT(DISTINCT Taxon_concept) AS tc, COLLECT(Accepted_name) AS an | |
MATCH (x:name)-[r]->(y:name) | |
WHERE NONE (a IN allNames | |
WHERE x = a) | |
WITH x.name AS Unplaced_name, y.name AS is_a_synonym_of, COLLECT(DISTINCT r.author) AS according_tos, tc AS Taxa_according_to_P5 | |
RETURN DISTINCT Unplaced_name, is_a_synonym_of, according_tos, Taxa_according_to_P5 | |
ORDER BY Unplaced_name | |
---- | |
//table |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment