https://github.com/ranking-agent/reasoner-transpiler/
The reasoner-transpiler Python library provides tools for converting a TRAPI query graph into a Cypher (Neo4j) query that generates the corresponding knowledge graph and results - it performs the "lookup" operation.
- generates TRAPI-compliant knowledge graph and results, directly from Neo4j!
- handles semantic operations with the biolink-model
- sub-categories
- sub-predicates
- symmetric and inverse predicates
- handles qnodes with
"is_set": true
- handles arbitrary query graphs: n-hops, branches, loops, etc.
Let's say we want to find all phenotypes associated with type-2 diabetes.
from reasoner_transpiler.cypher import get_query
qgraph = {
"nodes": {
"diabetes": {
"ids": ["MONDO:0005148"],
},
"phenotype": {
"categories": ["biolink:PhenotypicFeature"],
},
},
"edges": {
"has phenotype": {
"subject": "diabetes",
"predicates": ["biolink:has_phenotype"],
"object": "phenotype",
},
},
}
print(get_query(qgraph))
The reasoner transpiler generates the following Cypher query.
MATCH (`diabetes` {`id`: "MONDO:0005148"})-[`has phenotype`:`biolink:has_phenotype`|`biolink:phenotype_of`]-(`phenotype`:`biolink:PhenotypicFeature`)
WHERE (
(
type(`has phenotype`) in ["biolink:has_phenotype"]
AND startNode(`has phenotype`) = `diabetes`
)
OR (
type(`has phenotype`) in ["biolink:phenotype_of"]
AND startNode(`has phenotype`) = `phenotype`
)
)
WITH
{
node_bindings: {
`diabetes`: (CASE WHEN `diabetes` IS NOT NULL THEN [{id: `diabetes`.id}] ELSE [] END),
`phenotype`: (CASE WHEN `phenotype` IS NOT NULL THEN [{id: `phenotype`.id}] ELSE [] END)
},
edge_bindings: {
`has phenotype`: [ei IN collect(DISTINCT `has phenotype`.id) WHERE ei IS NOT null | {id: ei}]
}
} AS result,
{
nodes: collect(DISTINCT `diabetes`) + collect(DISTINCT `phenotype`),
edges: collect(DISTINCT `has phenotype`)
} AS knowledge_graph
UNWIND knowledge_graph.nodes AS knode UNWIND knowledge_graph.edges AS kedge
WITH
collect(DISTINCT result) AS results,
{
nodes: apoc.map.fromLists([n IN collect(DISTINCT knode) | n.id], [n IN collect(DISTINCT knode) | {
categories: labels(n),
name: n.name,
attributes: [key in apoc.coll.subtract(keys(n), ["id", "category"]) | {
original_attribute_name: key,
attribute_type_id: COALESCE({publications: "EDAM:data_0971"}[key], "NA"),
value: n[key]
}]
}]),
edges: apoc.map.fromLists(
[e IN collect(DISTINCT kedge) | e.id],
[e IN collect(DISTINCT kedge) | {
predicate: type(e),
subject: startNode(e).id,
object: endNode(e).id,
attributes: [key in apoc.coll.subtract(keys(e), ["id", "predicate"]) | {
original_attribute_name: key,
attribute_type_id: COALESCE({publications: "EDAM:data_0971"}[key], "NA"),
value: e[key]
}]
}]
)
} AS knowledge_graph
RETURN results, knowledge_graph
There's a lot going on. Let's break this down.
MATCH (`diabetes` {`id`: "MONDO:0005148"})-[`has phenotype`:`biolink:has_phenotype`|`biolink:phenotype_of`]-(`phenotype`:`biolink:PhenotypicFeature`)
The MATCH
clause finds all node-relationship-node triples including MONDO:0005148 and a phenotype with the predicate "has phenotype" or "phenotype of". It does not enforce a relationship direction.
WHERE (
(
type(`has phenotype`) in ["biolink:has_phenotype"]
AND startNode(`has phenotype`) = `diabetes`
)
OR (
type(`has phenotype`) in ["biolink:phenotype_of"]
AND startNode(`has phenotype`) = `phenotype`
)
)
The WHERE
clause retains only the triples of the form "diabetes has phenotype..." and "...phenotype of diabetes". Thus we properly capture both the requested predicate and its inverse.
We have the right stuff now. Everything from here on is reshaping it into the TRAPI format.
WITH
{
node_bindings: {
`diabetes`: (CASE WHEN `diabetes` IS NOT NULL THEN [{id: `diabetes`.id}] ELSE [] END),
`phenotype`: (CASE WHEN `phenotype` IS NOT NULL THEN [{id: `phenotype`.id}] ELSE [] END)
},
edge_bindings: {
`has phenotype`: [ei IN collect(DISTINCT `has phenotype`.id) WHERE ei IS NOT null | {id: ei}]
}
} AS result,
{
nodes: collect(DISTINCT `diabetes`) + collect(DISTINCT `phenotype`),
edges: collect(DISTINCT `has phenotype`)
} AS knowledge_graph
The first WITH
clause builds the individual results and compiles nodes and relationships into little per-result proto-knowledge graphs. The knowledge graph nodes and edges still need a lot of reformatting.
UNWIND knowledge_graph.nodes AS knode UNWIND knowledge_graph.edges AS kedge
The UNWIND
clauses break all of the individual knodes and kedges out of each result so that we can later combine them into a single big knowledge graph. If our little collect-unwind-collect procedure seems overcomplicated, that's because it is. The complexity is necessary in cases where we get "is_set": true
because we can bind multiple knowledge-graph nodes to a single query-graph node in the same result.
WITH
collect(DISTINCT result) AS results,
{
nodes: apoc.map.fromLists([n IN collect(DISTINCT knode) | n.id], [n IN collect(DISTINCT knode) | {
categories: labels(n),
name: n.name,
attributes: [key in apoc.coll.subtract(keys(n), ["id", "category"]) | {
original_attribute_name: key,
attribute_type_id: coalesce({publications: "EDAM:data_0971"}[key], "NA"),
value: n[key]
}]
}]),
edges: apoc.map.fromLists(
[e IN collect(DISTINCT kedge) | e.id],
[e IN collect(DISTINCT kedge) | {
predicate: type(e),
subject: startNode(e).id,
object: endNode(e).id,
attributes: [key in apoc.coll.subtract(keys(e), ["id", "predicate"]) | {
original_attribute_name: key,
attribute_type_id: coalesce({publications: "EDAM:data_0971"}[key], "NA"),
value: e[key]
}]
}]
)
} AS knowledge_graph
The final WITH
clause collects everything into a single results list and knowledge graph, while reformatting the nodes/edges as maps and constructing proper attributes. This is one place where apoc is used to help with list/map manipulations. Some more complicated situations use it in other ways.
What comes out of Neo4j is a TRAPI-compliant knowledge graph and results - there is no need to reformat the output in Python.
The reasoner transpiler makes assumptions about how the data are represented in Neo4j. Its assumptions are largely consistent with KGX standards for Neo4j, but are tuned specifically for the data structure used by Plater KPs.