Last active
December 12, 2015 03:58
-
-
Save pangloss/4710452 to your computer and use it in GitHub Desktop.
GraphTO presentation notes and code snippets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Using Pacer | |
Darrick Wiebe | |
[email protected] | |
The most fundamental concept in Pacer | |
out / in | |
[ out ] | |
v --(e)-> v | |
[ in ] | |
v.out_e.in_v | |
v | |
-> e | |
-> v | |
v.in_e.out_v | |
v | |
<- e | |
<- v | |
v.out | |
v | |
-(e)-> v (e is skipped over) | |
v.out_e.out_v | |
v | |
-> e | |
v -> (reverse) | |
v.both | |
v | |
-(e)-> v | |
<-(e)- v | |
Pacer Routes | |
A Route lets you define a step-by-step way through the graph in an | |
intuitive way. The power this gives you is incredible. | |
http://www.youtube.com/watch?v=7kI1d7DMbco | |
Routes are lazy and chainable | |
g.v.out_e.in_v.limit(1000).properties.keys.flatten.frequencies | |
There are a few methods that execute the route immediately | |
.first | |
.each | |
.to_a | |
Steps in a Route | |
g [ .v ] [ .out_e ] [ .in_v ] [ .properties ] [ .keys ] [ .flatten ] .frequencies | |
Paths | |
[ v, e, v, { .. }, [ .. ], ".." ] | |
g.v.out_e.in_v.limit(1000).properties.keys.flatten.paths.first | |
Exploring further: enter PacerXml | |
g = Pacer.neo4j '/tmp/graphto' | |
g.v.delete! | |
PacerXml::Sample.load_100_software g | |
# >> g.v.count | |
# 22678 | |
# >> g.e.count | |
# 24181 | |
Extending Pacer | |
g.v(type: 'examiner') | |
module Examiner | |
module Vertex | |
def display_name | |
"examiner #{ self['last-name'] } #{ self.in_edges.count }" | |
end | |
end | |
end | |
g.v(Examiner, type: 'examiner') | |
module Examiner | |
def self.route_conditions | |
{ type: 'examiner' } | |
end | |
end | |
g.v(Examiner) | |
module Patent; end | |
module Examiner | |
module Route | |
def patents | |
self.in(:examiners, Patent) | |
end | |
def departments | |
out(:department) | |
end | |
end | |
end | |
module Patent; end | |
module Examiner | |
def self.route_conditions | |
{ type: 'examiner' } | |
end | |
module Vertex | |
def display_name | |
"examiner #{ self['last-name'] } #{ self.in_edges.count }" | |
end | |
end | |
module Route | |
def patents | |
self.in(:examiners, Patent) | |
end | |
def departments | |
out(:department) | |
end | |
end | |
end | |
module Patent | |
def self.route_conditions | |
{ type: 'patent' } | |
end | |
module Route | |
def examiners(n = nil) | |
if n | |
lookahead(min: n) { |p| p.out_e(:examiners) } | |
else | |
out(:examiners, Examiner) | |
end | |
end | |
end | |
end | |
Pacer has no 'global graph' but you can easily set your own | |
module MyApp | |
Graph = Pacer.neo4j "my/app/graph" | |
end | |
def Patent.all(*args, &block) | |
MyApp::Graph.v(Patent, *args, &block) | |
end | |
Without an assumed global graph, working with multiple graphs is easy. | |
Examining the schema | |
PacerXml::Sample.structure! g #=> file 'patent-structure.graphml' | |
Manipulating Data | |
We can see in the visualization that the examiner rel is not quite right | |
g.v(type: 'examiner').in_e.labels.frequencies | |
We can fix it by creating a new relationship and deleting the old one | |
g.v(Patent).bulk_job do |p| | |
p.add_edges_to :examiners, p.out(type: 'examiners').out(Examiner) | |
end | |
g.v(Patent).out(type: 'examiners').delete! | |
The Patent could also be cleaned up a little. | |
g.v(Patent).properties.first | |
What are the possible values of number-of-claims? | |
g.v(Patent).frequencies 'number-of-claims' | |
Set and remove a property | |
g.v(Patent).bulk_job do |p| | |
p[:claim_count] = p['number-of-claims'].to_i | |
p['number-of-claims'] = nil | |
end | |
Did it work? | |
g.v(Patent).frequencies 'number-of-claims' | |
g.v(Patent).frequencies :claim_count | |
Some Examiners have a department number property that would make more | |
sense as a relationship. | |
g.v(Examiner).properties.limit 10 | |
g.v(Examiner).property?(:department).uniq | |
g.v(Examiner).property?(:department).uniq.bulk_job do |n| | |
g.create_vertex type: 'department', department: n | |
end | |
Look up examiners from each department and associate them | |
g.create_key_index :department, :vertex | |
g.v(type: 'department').bulk_job do |d| | |
g.v(Examiner, department: d[:department].to_s).add_edges_to :department, d | |
end | |
g.v(Patent).examiners.departments | |
Neo4j Integration | |
Cypher! | |
START v=node:node_auto_index(type = 'patent') | |
MATCH v-[:examiners]->examiner | |
RETURN v, examiner | |
query = <<CYPHER | |
START v=node:node_auto_index(type = 'patent') | |
MATCH v-[:examiners]->examiner | |
RETURN v, examiner | |
CYPHER | |
Simple cypher query returning a pair of vertices | |
g.cypher(query).limit(10) | |
This time just grab the Examiner vertex and wrap it | |
r = g.cypher(query).limit(10).tails.v(Examiner) | |
Pacer is about streaming though... | |
Streaming Cypher queries?? | |
patent_examiners = <<CYPHER | |
MATCH v-[:examiners]->examiner | |
RETURN v, examiner | |
CYPHER | |
r = g.v(Patent).cypher(patent_examiners).tails.v(Examiner).limit(10) | |
How does that work? | |
r.back... | |
r.paths.first | |
Chaining Cypher queries! | |
other_examiners = <<CYPHER | |
MATCH v<-[:examiners]-()-[:examiners]->other | |
RETURN v, other | |
CYPHER | |
g.v(Patent).cypher(patent_examiners).tails.cypher(other_examiners).limit(100) | |
Neo4j Path Finding Algorithms | |
all_patents = g.v(Patent) | |
all_patents.first.paths_to(all_patents) | |
Cypher returns paths, we can expand them | |
g.cypher(query).limit(10).expand | |
Neo4j Lucene Indices | |
Boolean logic | |
g.lucene('type:patent OR type:examiner') | |
Fuzzy matching | |
g.create_key_index 'last-name', :vertex | |
g.lucene('last-name:Fujihara').properties | |
g.lucene('last-name:Fujihara~').properties | |
Questions? | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment