MattBlissett · September 25, 2015 08:18
diff --git a/tnu-gist.txt b/tnu-gist.txt
 = Generating Global Checklists from Taxonomic Name Units
 :neo4j-version: 2.2.0
 :author: Matthew Blissett, Donald Hobern
 :description: Example Taxonomic Name Units that result from subsequent taxonomic acts
 :tags: domain:life-science, use-case:taxonomy

 == Notation

 * Single letters represent Linnaean names at various ranks
 * Capital letters mean that the name is accepted within a treatment as the name for a taxon
 * Lowercase letters mean that the name is considered a synonym
 * An accepted name may be followed by its lower case synonyms as a “word”
 * The same letter, regardless of case, always represents the same Linnaean name wherever it is used
 * Pn (_n_ is an integer) represents an authority (publicationy) with nomenclatural acts and/or taxonomic treatments

 In other words _A_, _Bc_, _Def_ represents the following:

 * Species _A_
 * Species _B_ with synonym _C_
 * Species _D_ with synonyms _E_ and _F_

 == Problem

 Imagine that the entire world taxonomic literature had never proceeded beyond the following
 (each bullet represents one publication, in chronological order):

 * P1 – _A, B, C, D, E_
 * P2 – _Cd, E, F, G_
 * P3 – _Af, Ceg, D, H, I_
 * P4 – _Befg, I, J, K_
 * P5 – _A, B, Cd, E, Gj, H, I, K_

 What possible global checklists of names might exist?

 (The initial set-up of data is hidden, reveal it with the `+`.)

 //setup
 //hide
 [source,cypher]
 ----
 // Create names (nodes) from P1
 CREATE (`A`:name {name:'A',author:'P1'}),
 	(`B`:name {name:'B',author:'P1'}),
 	(`C`:name {name:'C',author:'P1'}),
 	(`D`:name {name:'D',author:'P1'}),
 	(`E`:name {name:'E',author:'P1'})

 // Create accepted-name relationships from P1
 CREATE (`A`)-[:ACCEPTED_AS {author:'P1'}]->(`A`),
 	(`B`)-[:ACCEPTED_AS {author:'P1'}]->(`B`), 
 	(`C`)-[:ACCEPTED_AS {author:'P1'}]->(`C`), 
 	(`D`)-[:ACCEPTED_AS {author:'P1'}]->(`D`), 
 	(`E`)-[:ACCEPTED_AS {author:'P1'}]->(`E`)

 // Create names (nodes) from P2
 CREATE (`F`:name {name:'F',author:'P2'}),
 	(`G`:name {name:'G',author:'P2'})

 // Create accepted-name relationships from P2
 CREATE (`C`)-[:ACCEPTED_AS {author:'P2'}]->(`C`),
 	(`D`)-[:ACCEPTED_AS {author:'P2'}]->(`C`), 
 	(`E`)-[:ACCEPTED_AS {author:'P2'}]->(`E`), 
 	(`F`)-[:ACCEPTED_AS {author:'P2'}]->(`F`), 
 	(`G`)-[:ACCEPTED_AS {author:'P2'}]->(`G`)

 // Create names (nodes) from P3
 CREATE (`H`:name {name:'H',author:'P3'}),
 	(`I`:name {name:'I',author:'P3'})

 // Create accepted-name relationships from P3
 CREATE (`A`)-[:ACCEPTED_AS {author:'P3'}]->(`A`),
 	(`F`)-[:ACCEPTED_AS {author:'P3'}]->(`A`), 
 	(`C`)-[:ACCEPTED_AS {author:'P3'}]->(`C`), 
 	(`E`)-[:ACCEPTED_AS {author:'P3'}]->(`C`), 
 	(`G`)-[:ACCEPTED_AS {author:'P3'}]->(`C`), 
 	(`D`)-[:ACCEPTED_AS {author:'P3'}]->(`D`), 
 	(`H`)-[:ACCEPTED_AS {author:'P3'}]->(`H`), 
 	(`I`)-[:ACCEPTED_AS {author:'P3'}]->(`I`)

 // Create names (nodes) from P4
 CREATE (`J`:name {name:'J',author:'P4'}),
 	(`K`:name {name:'K',author:'P4'})

 // Create accepted-name relationships from P4
 CREATE (`B`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
 	(`E`)-[:ACCEPTED_AS {author:'P4'}]->(`B`), 
 	(`F`)-[:ACCEPTED_AS {author:'P4'}]->(`B`), 
 	(`G`)-[:ACCEPTED_AS {author:'P4'}]->(`B`), 
 	(`I`)-[:ACCEPTED_AS {author:'P4'}]->(`I`), 
 	(`J`)-[:ACCEPTED_AS {author:'P4'}]->(`J`), 
 	(`K`)-[:ACCEPTED_AS {author:'P4'}]->(`K`)

 // No new names (nodes) from P5

 // Create accepted-name relationships from P5
 CREATE (`A`)-[:ACCEPTED_AS {author:'P5'}]->(`A`),
 	(`B`)-[:ACCEPTED_AS {author:'P5'}]->(`B`), 
 	(`C`)-[:ACCEPTED_AS {author:'P5'}]->(`C`), 
 	(`D`)-[:ACCEPTED_AS {author:'P5'}]->(`C`), 
 	(`E`)-[:ACCEPTED_AS {author:'P5'}]->(`E`), 
 	(`G`)-[:ACCEPTED_AS {author:'P5'}]->(`G`), 
 	(`J`)-[:ACCEPTED_AS {author:'P5'}]->(`G`), 
 	(`H`)-[:ACCEPTED_AS {author:'P5'}]->(`H`), 
 	(`I`)-[:ACCEPTED_AS {author:'P5'}]->(`I`),
 	(`K`)-[:ACCEPTED_AS {author:'P5'}]->(`K`)

 RETURN *
 ----

 _Unfortunately, multiple relationships between the same nodes are not visible._

 //graph_result

 == Results

 === How many authorities are there?

 [source,cypher]
 ----
 MATCH (x:name)-[a:ACCEPTED_AS]->(y:name)
 RETURN COUNT(DISTINCT a.author) AS Count_of_Authorities
 ----
 //table

 == How many names are there?

 [source,cypher]
 ----
 MATCH (x:name)
 RETURN COUNT(x) AS Count_of_Names
 ----
 //table

 == How many _taxonomic name units_ (TNUs) are there?

 [source,cypher]
 ----
 MATCH (x:name)-[a:ACCEPTED_AS]->(y:name)
 RETURN COUNT(a) AS Count_of_TNUs
 ----
 //table

 == How many taxon concepts _sensu_ authority?

 [source,cypher]
 ----
 MATCH (x:name)-[a:ACCEPTED_AS]->(x:name)
 RETURN COUNT(a) AS Count_of_Taxon_Concepts_sensu_Authority
 ----
 //table

 == How many taxon concepts by synonymy

 Explanation of query: Start by finding accepted taxa, and all synonyms (if any).  Group those synonyms into “taxon concepts” according to authority, then group authorities by distinct sets of taxon concepts.

 [source,cypher]
 ----
 MATCH (acc:name)-[accrel:ACCEPTED_AS]->(acc:name)
 OPTIONAL MATCH (syn:name)-[synrel:ACCEPTED_AS]->(acc:name)
 WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, synrel.author AS author, acc.name AS Accepted_name
 RETURN Taxon_concept, COLLECT(author) AS Authorities, Accepted_name
 ORDER BY Accepted_name, LENGTH(Taxon_concept)
 ----
 //table

 == What are our possible “catalogues of life”?

 Assuming we rely heavily on P5, what are the possible complete taxonomies?

 [source,cypher]
 ----
 MATCH (syn:name)-[synrel:ACCEPTED_AS { author:"P5" }]->(acc:name)
 WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, acc.name AS Accepted_name
 MATCH (syn2:name)-[synrel2:ACCEPTED_AS { author:"P5" }]->(acc2:name)
 WITH COLLECT(DISTINCT syn2) AS allNames, COLLECT(DISTINCT Taxon_concept) AS tc, COLLECT(Accepted_name) AS an
 MATCH (x:name)-[r]->(y:name)
 WHERE NONE (a IN allNames 
            WHERE x = a)
 RETURN DISTINCT x.name AS Unplaced_name, y.name AS is_a_synonym_of, r.author AS according_to, tc AS Taxa_according_to_P5
 ----
 //table

 What if, instead, we start with P4?

 [source,cypher]
 ----
 MATCH (syn:name)-[synrel:ACCEPTED_AS { author:"P4" }]->(acc:name)
 WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, acc.name AS Accepted_name
 MATCH (syn2:name)-[synrel2:ACCEPTED_AS { author:"P4" }]->(acc2:name)
 WITH COLLECT(DISTINCT syn2) AS allNames, COLLECT(DISTINCT Taxon_concept) AS tc, COLLECT(Accepted_name) AS an
 MATCH (x:name)-[r]->(y:name)
 WHERE NONE (a IN allNames 
            WHERE x = a)
 WITH x.name AS Unplaced_name, y.name AS is_a_synonym_of, COLLECT(DISTINCT r.author) AS according_tos, tc AS Taxa_according_to_P5
 RETURN DISTINCT Unplaced_name, is_a_synonym_of, according_tos, Taxa_according_to_P5
 ORDER BY Unplaced_name
 ----
 //table
	= Generating Global Checklists from Taxonomic Name Units
	:neo4j-version: 2.2.0
	:author: Matthew Blissett, Donald Hobern
	:description: Example Taxonomic Name Units that result from subsequent taxonomic acts
	:tags: domain:life-science, use-case:taxonomy

	== Notation

	* Single letters represent Linnaean names at various ranks
	* Capital letters mean that the name is accepted within a treatment as the name for a taxon
	* Lowercase letters mean that the name is considered a synonym
	* An accepted name may be followed by its lower case synonyms as a “word”
	* The same letter, regardless of case, always represents the same Linnaean name wherever it is used
	* Pn (_n_ is an integer) represents an authority (publicationy) with nomenclatural acts and/or taxonomic treatments

	In other words _A_, _Bc_, _Def_ represents the following:

	* Species _A_
	* Species _B_ with synonym _C_
	* Species _D_ with synonyms _E_ and _F_

	== Problem

	Imagine that the entire world taxonomic literature had never proceeded beyond the following
	(each bullet represents one publication, in chronological order):

	* P1 – _A, B, C, D, E_
	* P2 – _Cd, E, F, G_
	* P3 – _Af, Ceg, D, H, I_
	* P4 – _Befg, I, J, K_
	* P5 – _A, B, Cd, E, Gj, H, I, K_

	What possible global checklists of names might exist?

	(The initial set-up of data is hidden, reveal it with the `+`.)

	//setup
	//hide
	[source,cypher]
	----
	// Create names (nodes) from P1
	CREATE (`A`:name {name:'A',author:'P1'}),
	(`B`:name {name:'B',author:'P1'}),
	(`C`:name {name:'C',author:'P1'}),
	(`D`:name {name:'D',author:'P1'}),
	(`E`:name {name:'E',author:'P1'})

	// Create accepted-name relationships from P1
	CREATE (`A`)-[:ACCEPTED_AS {author:'P1'}]->(`A`),
	(`B`)-[:ACCEPTED_AS {author:'P1'}]->(`B`),
	(`C`)-[:ACCEPTED_AS {author:'P1'}]->(`C`),
	(`D`)-[:ACCEPTED_AS {author:'P1'}]->(`D`),
	(`E`)-[:ACCEPTED_AS {author:'P1'}]->(`E`)

	// Create names (nodes) from P2
	CREATE (`F`:name {name:'F',author:'P2'}),
	(`G`:name {name:'G',author:'P2'})

	// Create accepted-name relationships from P2
	CREATE (`C`)-[:ACCEPTED_AS {author:'P2'}]->(`C`),
	(`D`)-[:ACCEPTED_AS {author:'P2'}]->(`C`),
	(`E`)-[:ACCEPTED_AS {author:'P2'}]->(`E`),
	(`F`)-[:ACCEPTED_AS {author:'P2'}]->(`F`),
	(`G`)-[:ACCEPTED_AS {author:'P2'}]->(`G`)

	// Create names (nodes) from P3
	CREATE (`H`:name {name:'H',author:'P3'}),
	(`I`:name {name:'I',author:'P3'})

	// Create accepted-name relationships from P3
	CREATE (`A`)-[:ACCEPTED_AS {author:'P3'}]->(`A`),
	(`F`)-[:ACCEPTED_AS {author:'P3'}]->(`A`),
	(`C`)-[:ACCEPTED_AS {author:'P3'}]->(`C`),
	(`E`)-[:ACCEPTED_AS {author:'P3'}]->(`C`),
	(`G`)-[:ACCEPTED_AS {author:'P3'}]->(`C`),
	(`D`)-[:ACCEPTED_AS {author:'P3'}]->(`D`),
	(`H`)-[:ACCEPTED_AS {author:'P3'}]->(`H`),
	(`I`)-[:ACCEPTED_AS {author:'P3'}]->(`I`)

	// Create names (nodes) from P4
	CREATE (`J`:name {name:'J',author:'P4'}),
	(`K`:name {name:'K',author:'P4'})

	// Create accepted-name relationships from P4
	CREATE (`B`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
	(`E`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
	(`F`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
	(`G`)-[:ACCEPTED_AS {author:'P4'}]->(`B`),
	(`I`)-[:ACCEPTED_AS {author:'P4'}]->(`I`),
	(`J`)-[:ACCEPTED_AS {author:'P4'}]->(`J`),
	(`K`)-[:ACCEPTED_AS {author:'P4'}]->(`K`)

	// No new names (nodes) from P5

	// Create accepted-name relationships from P5
	CREATE (`A`)-[:ACCEPTED_AS {author:'P5'}]->(`A`),
	(`B`)-[:ACCEPTED_AS {author:'P5'}]->(`B`),
	(`C`)-[:ACCEPTED_AS {author:'P5'}]->(`C`),
	(`D`)-[:ACCEPTED_AS {author:'P5'}]->(`C`),
	(`E`)-[:ACCEPTED_AS {author:'P5'}]->(`E`),
	(`G`)-[:ACCEPTED_AS {author:'P5'}]->(`G`),
	(`J`)-[:ACCEPTED_AS {author:'P5'}]->(`G`),
	(`H`)-[:ACCEPTED_AS {author:'P5'}]->(`H`),
	(`I`)-[:ACCEPTED_AS {author:'P5'}]->(`I`),
	(`K`)-[:ACCEPTED_AS {author:'P5'}]->(`K`)

	RETURN *
	----

	_Unfortunately, multiple relationships between the same nodes are not visible._

	//graph_result

	== Results

	=== How many authorities are there?

	[source,cypher]
	----
	MATCH (x:name)-[a:ACCEPTED_AS]->(y:name)
	RETURN COUNT(DISTINCT a.author) AS Count_of_Authorities
	----
	//table

	== How many names are there?

	[source,cypher]
	----
	MATCH (x:name)
	RETURN COUNT(x) AS Count_of_Names
	----
	//table

	== How many _taxonomic name units_ (TNUs) are there?

	[source,cypher]
	----
	MATCH (x:name)-[a:ACCEPTED_AS]->(y:name)
	RETURN COUNT(a) AS Count_of_TNUs
	----
	//table

	== How many taxon concepts _sensu_ authority?

	[source,cypher]
	----
	MATCH (x:name)-[a:ACCEPTED_AS]->(x:name)
	RETURN COUNT(a) AS Count_of_Taxon_Concepts_sensu_Authority
	----
	//table

	== How many taxon concepts by synonymy

	Explanation of query: Start by finding accepted taxa, and all synonyms (if any). Group those synonyms into “taxon concepts” according to authority, then group authorities by distinct sets of taxon concepts.

	[source,cypher]
	----
	MATCH (acc:name)-[accrel:ACCEPTED_AS]->(acc:name)
	OPTIONAL MATCH (syn:name)-[synrel:ACCEPTED_AS]->(acc:name)
	WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, synrel.author AS author, acc.name AS Accepted_name
	RETURN Taxon_concept, COLLECT(author) AS Authorities, Accepted_name
	ORDER BY Accepted_name, LENGTH(Taxon_concept)
	----
	//table

	== What are our possible “catalogues of life”?

	Assuming we rely heavily on P5, what are the possible complete taxonomies?

	[source,cypher]
	----
	MATCH (syn:name)-[synrel:ACCEPTED_AS { author:"P5" }]->(acc:name)
	WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, acc.name AS Accepted_name
	MATCH (syn2:name)-[synrel2:ACCEPTED_AS { author:"P5" }]->(acc2:name)
	WITH COLLECT(DISTINCT syn2) AS allNames, COLLECT(DISTINCT Taxon_concept) AS tc, COLLECT(Accepted_name) AS an
	MATCH (x:name)-[r]->(y:name)
	WHERE NONE (a IN allNames
	WHERE x = a)
	RETURN DISTINCT x.name AS Unplaced_name, y.name AS is_a_synonym_of, r.author AS according_to, tc AS Taxa_according_to_P5
	----
	//table

	What if, instead, we start with P4?

	[source,cypher]
	----
	MATCH (syn:name)-[synrel:ACCEPTED_AS { author:"P4" }]->(acc:name)
	WITH COLLECT(DISTINCT syn.name) AS Taxon_concept, acc.name AS Accepted_name
	MATCH (syn2:name)-[synrel2:ACCEPTED_AS { author:"P4" }]->(acc2:name)
	WITH COLLECT(DISTINCT syn2) AS allNames, COLLECT(DISTINCT Taxon_concept) AS tc, COLLECT(Accepted_name) AS an
	MATCH (x:name)-[r]->(y:name)
	WHERE NONE (a IN allNames
	WHERE x = a)
	WITH x.name AS Unplaced_name, y.name AS is_a_synonym_of, COLLECT(DISTINCT r.author) AS according_tos, tc AS Taxa_according_to_P5
	RETURN DISTINCT Unplaced_name, is_a_synonym_of, according_tos, Taxa_according_to_P5
	ORDER BY Unplaced_name
	----
	//table