Suppose we have a business with only 4 customers and 2 products. The customer can either view or buy a product. We can represent our customers and products as a graph like so.
CREATE
(bob { name:'Bob' }),
(tom { name:'Tom' }),
(emily { name:'Emily' }),
(lily { name:'Lily'}),
(iphone { product: 'iPhone' }),
(ae { product: 'American Express' }),
(iphone)<-[:BUY]-(bob),
(iphone)<-[:VIEW]-(bob),
(iphone)<-[:VIEW]-(tom),
(iphone)<-[:BUY]-(tom),
(iphone)<-[:VIEW]-(emily),
(iphone)<-[:BUY]-(emily),
(ae)<-[:BUY]-(bob),
(ae)<-[:VIEW]-(bob),
(ae)<-[:VIEW]-(emily),
(ae)<-[:BUY]-(lily),
(ae)<-[:VIEW]-(lily)
Most of the time you don’t know who your customers are. But you probably have a good idea what products (or types of products) that you have in your store. So let’s start at a product node and see what we can find out.
Say we pick our iPhone node to begin with. We can talk through the :BUY
edges to find who bought iPhones in our store.
START x=node:node_auto_index(product='iPhone')
MATCH (x)<-[:BUY]-(person)
RETURN person
So far so good? Let’s extend this further.
What else did people that bought an iPhone also purchased?
START x=node:node_auto_index(product='iPhone')
MATCH (x)<-[:BUY]-(person),
(y)<-[:BUY]-(person)
RETURN y
For this trivial example, as we only have 2 products, this might seem silly. However, even if we have more products, we can do an aggregation count instead to see what are the top items people bought.
Suppose you think people that buys iPhone is likely to sign up for American Express. You want to make a product recommendation based on this hypothesis. Who should you target?
In this query, we find people that:
-
bought an iPhone
-
viewed American Express (thus showing some initial interest at least)
-
haven’t already signed up for American Express
START x=node:node_auto_index(product='iPhone'),
y=node:node_auto_index(product='American Express')
MATCH (x)<-[:BUY]-(p),
(y)<-[:VIEW]-(p)
WHERE NOT (y)<-[:BUY]-(p)
RETURN p
This was part of an introduction to a talk at Strata London 2013. The rest of the slides are on SlideShare.
Nice! Any chance to put some more explainations into the gist and put it on the graphgist wiki so others can find it? Thanks!