Skip to content

Instantly share code, notes, and snippets.

@Shahzad6077
Forked from jeremysears/gremlin-cheat-sheet.md
Last active March 17, 2021 07:59
Show Gist options
  • Save Shahzad6077/c43ff127b0b1fd746f8515f25793e493 to your computer and use it in GitHub Desktop.
Save Shahzad6077/c43ff127b0b1fd746f8515f25793e493 to your computer and use it in GitHub Desktop.
Gremlin Cheat Sheet in Groovy

https://livebook.manning.com/book/graph-databases-in-action/a-apache-tinkerpop-installation-and-overview/v-4/

Gremlin Cheat Sheet in Groovy

Gremin traversal examples taken from the excellent DS330: DataStax Enterprise Graph course.

Creating Vertices and Vertex Properties

Add a Vertex

Vertex u = graph.addVertex("user");
       u.property("userId","u2016");
       u.property("age",36);
       u.property("gender","M");

Delete Properties, Edges, and Vertices

Vertex u = graph.addVertex(...);
Vertex m = graph.addVertex(...);
Edge   e = u.addEdge("rated", m);

u.property("age").remove();
u.property("gender").remove();
e.remove();
m.remove();

Add an Edge

Vertex u1 = graph.addVertex(label,"user","userId","u2017");
Vertex u2 = g.V().has("user","userId","u2016").next();
u1.addEdge("knows", u2);

// Alternative, more compact syntax
graph.addVertex(label,"user","userId","u2017")
     .addEdge("knows",g.V().has("user","userId","u2016").next());

Gremlin I/O

Export/Import a graph to GraphML, a lossy XML graph representation (no meta-properties or multi-properties)

graph.io(IoCore.graphml())
     .writeGraph("KillrVideo.xml")

graph.io(IoCore.graphml())
     .readGraph("KillrVideo.xml")

Export/Import a graph to GraphSON, a lossless JSON format native to TinkerPop.

graph.io(IoCore.graphson())
     .writeGraph("KillrVideo.json")

graph.io(IoCore.graphson())
     .readGraph("KillrVideo.json")

Export/Import a graph to Gryo, a lossless binary format, based on Kryo.

graph.io(IoCore.gryo())
     .writeGraph("KillrVideo.kryo")

graph.io(IoCore.gryo())
     .readGraph("KillrVideo.kryo")

Simple Gremlin Graph Traversals

A linear traversal

g
  .V()
  .has("title","Alice in Wonderland")
  .has("year",2010)
  .out("director")
  .values("name")

// Sample output:
// Tim Burton

A nested traversal, creating a tree of Steps

g
  .V()
  .has("title","Alice in Wonderland")
  .has("year",2010)
  .union(__.out("director"),
         __.out("screenwriter"))
  .values("name")

  // Sample output:
  // Tim Burton
  // Linda Woolverton

Lambda Steps

Using the map() Step. map(Function<Traverser<S>, E>) Map the traverser to some object of type E for the next step to process.

g.V()
  .has("movieId","m267")
  .map{it.get().value("title")}

// Sample output:
// Alice in Wonderland

Using the flatMap() Step. flatMap(Function<Traverser<S>, Iterator<E>>) Map the traverser to an iterator of E objects that are streamed to the next step.

g.V()
  .has("movieId","m267")
  .flatMap{it.get().vertices(OUT,"actor")}
  .map{it.get().value("name")}

// Sample output:
// Johnny Depp
// Anne Hathaway
// ...

Using the filter() Step. filter(Predicate<Traverser<S>>) Map the traverser to either true or false, where false will not pass the traverser to the next step.

g.V()
  .filter{it.get().keys().contains("movieId") && it.get().value("movieId") == "m267"}
  .flatMap{it.get().vertices(OUT,"actor")}
  .map{it.get().value("name")}

// Sample output:
// Johnny Depp
// ...

Using the sideEffect() Step. sideEffect(Consumer<Traverser<S>>) Perform some operation on the traverser and pass it to the next step.

g.V()
  .filter{it.get().keys().contains("movieId") && it.get().value("movieId") == "m267"}
  .sideEffect{System.out.println("The actors of " + it.get().value("title"))}
  .flatMap{it.get().vertices(OUT,"actor")}
  .map{it.get().value("name")}

// Sample output:
// The actors of Alice in Wonderland
// Johnny Depp
// ...

Using the branch() Step. branch(Function<Traverser<S>,M>) Split the traverser to all the traversals indexed by the M token.

// If the actor vertices have a "name" property that is 
// equal to "Johnny Depp", output the "personId" property, 
// otherwise, output the "name" property.
g.V()
  .filter{it.get().keys().contains("movieId") && it.get().value("movieId") == "m267"}
  .flatMap{it.get().vertices(OUT,"actor")}
  .branch(values("name")).option("Johnny Depp",values("personId")).option(none,values("name"))

// Sample output:
// p4361
// Anne Hathaway
// ...

Derrived Steps

That last example can be rewritten using derrived steps, which are both more expressive and more performant.

g.V()
  .has("movieId","m267")
  .out("actor")
  .choose(has("name","Johnny Depp"), 
          values("personId"),
          values("name"))

Branching Traversals

Branching using the coalesce() Step. coalesce(traversal, …) Evaluate the provided traversals in order and return the first traversal that emits at least one element.

g.V()
  .has("person","name","Johnny Depp").in()
  .coalesce(out("belongsTo").has("name","Documentary"),
            has("year",2010) )
  .values("year","title")

// Sample output:
// 2010
// Alice in Wonderland

The example above is a bit contrived. A real-world usage of this can be found in the Gremlin DSLs in Java with DSE Graph blog article. In that example, the first argument to coalesce is a traversal to a vertex. If that vertex doesn't exist, it executes a second traversal which creates and returns the vertex. This can be used to ensure that we only create a single copy of a piece of data. There are mulitple other uses that can be found in Gremlin Recipes and The Gremlin Compendium

Branching using the choose() Step and the option() Step modulator. choose(traversal) Evaluate the internal traversal and route the traverser to a matching traversal branch option. option is a step modulator.

g.V()
  .has("movie","movieId","m267").both()
  .choose(label())
  .option("user",   valueMap())
  .option("person", values("name"))
  .option("genre",  values("name"))
  .option(none,     label())
  .sample(3)

// Sample output:
// Johnny Depp
// Adventure
// [gender:[M], userId:[u826], age:[36]]

Branching using the choose() Step with a predicate and true/false branches. choose(predicate, traversalOnTrue, traversalOnFalse) Evaluate the predicate and route the traverser to one of the two traversal branches.

g.V()
  .hasLabel("user")
  .choose(values("age").is(gte(13)),
          valueMap(),
          constant("no children under 13"))
  .sample(3)

// Sample output:
// no children under 13
// [gender:[F], userId:[u614], age:[30]]
// [gender:[M], userId:[u616], age:[46]]

Branching using the union() Step to merge the results of several traversers. union(traversal, …) Evaluate all internal traversals and merge their results.

g.V()
  .has("movie","movieId","m267")
  .union(out("director","composer","screenwriter","cinematographer").values("name"),
         values("production"))

// Sample output:
// Tim Burton
// Danny Elfman
// Linda Woolverton
// Dariusz Wolski
// Tim Burton Animation Co.
// Walt Disney Productions

Branching using the local() Step. local(traversal) Evaluate the object-local internal traversal.

// Find an average rating for each movie.
g.V()
  .hasLabel("movie")
  .local(inE("rated").values("rating").mean())

// Sample output:
// 6.901639344262295
// 6.75
// ...

// Find the number of production companies involved in each movie with Sylvester Stallone
g.V()
  .has("person","name","Sylvester Stallone")
  .in("actor")
  .local(values("production").count())

More information on computing aggregates can be found in the Computing Aggregates section.

Branching using the repeat() Step. repeat(traversal) Repeatedly evaluate the internal traversal a certain number of times or until a condition is satisfied. This uses several step modulators to control repetition conditions.

Branching using the repeat() Step with the times() modulator. times(number) Define a number of repetitions.

// A simple, linear traversal with no repetition.
g.V().has("user","userId","u1")
  .out("knows").count()                  //4

// The same traversal, written with a repeat step.
// In this case, it is repeated 0 times, which is the
// equivalent of no recursion, so we yield the same result.
g.V().has("user","userId","u1")
  .repeat(out("knows")).times(0).count() //4
  
// In this case, the times() modulator preceds the repeat
// step.  This is the equivalent of a do-while loop semantics.
// In this case, we get one repetition, before we hit the 
// 0 repetition condition, so we yield a result of 1.  To be
// clear, this is just counting the "u1" user, not anyone 
// he/she knows.
g.V().has("user","userId","u1")
  .times(0).repeat(out("knows")).count() //1

// Putting the times modifier after the repeat step gives
// us while-do semantics.  In this case, we execute the 
// repetition once, counting the results.
g.V().has("user","userId","u1")
  .repeat(out("knows")).times(1).count() //4
  
// Finally, if we increase the do-while version's times
// modifier to 1, we see that we execution the repetition
// once, yielding 4 results.
g.V().has("user","userId","u1")
  .times(1).repeat(out("knows")).count() //4
  
// Here we've simply increased the recusion depth for both styles
g.V().has("user","userId","u1")
  .repeat(out("knows")).times(5).count() //715
g.V().has("user","userId","u1")
  .times(5).repeat(out("knows")).count() //715

// Here we introduce a dedup() step, which will 
// retain unique results.
g.V().has("user","userId","u1")
  .repeat(out("knows")).times(5).dedup().count() //397
g.V().has("user","userId","u1")
  .times(5).repeat(out("knows")).dedup().count() //397

Branching using the repeat() Step with the until() modulator. until(traversal) Define an internal traversal that must yield some results to break repetition.

// Find 17 y.o. connections of a particular user

// This traversal will run infinitely, because the until() modifier follows 
// the repeat modifier.  Consequently, we introduce a limit() step to stop
// the recursion.
g.V()
  .has("user","userId","u1")
  .repeat(both("knows"))
  .until(has("age",17))
  .limit(4)
  .valueMap()

// [gender:[M], userId:[u151], age:[17]]
// [gender:[M], userId:[u58], age:[17]]
// [gender:[F], userId:[u75], age:[17]]
// [gender:[M], userId:[u1], age:[17]]

// In this example, user "u1" happens to be 17.  Consequently, the
// recursion in the repeat step never happens, and the traversal stops
// with a single result (the "u1" user that started the traversal).
g.V()
  .has("user","userId","u1")
  .until(has("age",17))
  .repeat(both("knows"))
  .limit(4)
  .valueMap()

// [gender:[M], userId:[u1], age:[17]]

Branching using the repeat() Step with the until() and timeLimit() modulators. timeLimit(number) Set a maximum number of milliseconds to spend on traversal evaluation. The clock starts when the first traverser executes a timeLimit step.

// Starting with a particular user, find direct or indirect friends with 20+ connections

g.V()
  .has("user","userId","u1")
  .repeat(timeLimit(200).both("knows"))
  .until(both("knows").count().is(gt(20)))
  .dedup()
  .valueMap()

// Sample output:
// [gender:[M], userId:[u874], age:[38]]

In addition to limit() and timeLimit() modulators, there are additional ways to limit search results, as described in Limiting Traversals

Branching using the repeat() Step with the emit() modulator. emit() Send a copy of the current traverser past repetition.

// Find friends and friends of friends of a particular user

// How many people does user "u1" directly know?
g.V()
  .has("user","userId","u1")
  .repeat(out("knows"))
  .times(1)
  .count()        //4

// How many people do user "u1"'s direct connections know (how 
// many friends-of-friends does user "u1" have)?
// Note that this does not include user "u1"'s direct connections (friends)
// unless they know each other.  The repeat step only "emits" the 
// final traversal's result set to the count step.
g.V()
  .has("user","userId","u1")
  .repeat(out("knows"))
  .times(2)
  .count()        //13

// How many friends and friends-of-friends does user "u1" have?
// At each iteration, we are emitting the list of results to the 
// count step.
g.V()
  .has("user","userId","u1")
  .repeat(out("knows"))
  .times(2)
  .emit()
  .count() //17

// How many friends and friends-of-friends does user "u1" have, including
// user "u1" himself?
// Placing the emit() step ahead of the branch causes us to emit the current
// traversal, prior to branching and emitting the results of each branch.
g.V()
  .has("user","userId","u1")
  .emit()
  .repeat(out("knows"))
  .times(2)
  .count() //18

Branching using the repeat() Step with the emit(traversal) modulator. emit(traversal) Send a copy of the current traverser past repetition if the internal traversal yields some results.

// Find if two users are connected

// Really, this should introduce a limit(1) modulator, but not using it better demonstrates
// the capabilities of the emit(traversal) step.
g.V()
  .has("user","userId","u1")
  .repeat(timeLimit(20).both("knows"))
  .emit(has("userId","u2")).valueMap()

// Sample output:
// [gender:[M], userId:[u2], age:[12]]
// [gender:[M], userId:[u2], age:[12]]
// [gender:[M], userId:[u2], age:[12]]
// ...

There is also a emit(predicate) variant that send a copy of the current traverser past repetition if the predicate evaluates to true.

Limiting traversals

Step Description
range([scope], low, high) Set the low and high inclusive range for a number of traversers that can pass to the next step in the traversal.
limit([scope], number) Set a maximum number of traversers that can pass to the next step in the traversal. Same as range(0, number).
tail([scope], [number]) Set a number (1 is default) of traversers that can pass to the next step in the traversal. Analogous to limit, except that tail emits the last n objects instead of the first n objects in the stream.
timeLimit(number) Set a maximum number of milliseconds to spend on traversal evaluation. The clock starts when the first traverser executes a timeLimit step.

Path Traversals

The available Path Steps are:

Step Description
path() Return a path object for a traverser.
simplePath() Examine traverser’s path and drop the traverser if repeated objects are found.
cyclicPath() Examine traverser’s path and drop the traverser if the path is simple.

A path object implements interface Path with low-level methods, such as size and get, to compute a path length and access constituent objects.

Using the path() Step.

// Find a path from a user to a friend of a friend

// Paths to Friends-of-friends of user "u1" (limit 1).
// This is a simple traversal with no recursion.  The
// `by` modulator tells the path Step to extract the
// "userId" property and use that as the path element
// label.  If we didn't have that step, we would instead
// see Vertex IDs.
g.V()
  .has("user","userId","u1")
  .out("knows")
  .out("knows")
  .path()
  .by("userId")
  .limit(1)
// [u1, u74, u750]

// This traversal does the same thing, but instead of adding userIds
// to the path, it adds the age of each user to the path.  So, user
// "u1" is age 17, his first connection is age 15, and his first connection's
// first connection is age 24.
g.V()
  .has("user","userId","u1")
  .out("knows")
  .out("knows")
  .path()
  .by("age")
  .limit(1)
// [17, 15, 24]

// This is basically the same traversal as above, but it reduces each path
// to find the minimum age of all users in a specific path.
g.V()
  .has("user","userId","u1")
  .out("knows")
  .out("knows")
  .path()
  .by("age")
  .min(local)
  .limit(1)
// 15

This last style of query could be useful if you if you had a graph of city vertices connected by distance edges, and you wanted to sum the distances in a path between a set of cities using the sum(local) aggregation step instead of the min(local) step.

Using a Path Traversal with the path() and simplePath() Steps to find the shortest path between two vertices.

// Find a shortest, simple path between two actors

// This traversal bounces back and forth between person
// and movie vertices across the actor edge until 
// a person vertex with the name "Leonardo DiCaprio" is
// reached.  At that point, the path is passed to a
// `by` modulator that extracts either the name property
// if the vertex is a person, otherwise the title property.
// Note that the addition of the simplePath(), early in the
// repetition traversal, prevents following cyclical paths.
// For example, consider the case where "Johnny Depp" was in
// "Edward Scissorhands".  Without the addition of the simplePath
// step, we would traverse all actors in "Edward Scissorhands",
// including traversing back to "Johnny Depp".
g.V()
  .has("person","name","Johnny Depp")
  .repeat(both("actor").simplePath().timeLimit(200))
  .until(has("person","name","Leonardo DiCaprio"))
  .path()
  .by(choose(hasLabel("person"),
             values("name"),
             values("title")))
  .limit(1)

// Sample output:
// [Johnny Depp, Finding Neverland, Kate Winslet,
//               Titanic, Leonardo DiCaprio]

Using a Path Traversal with the path() and cyclicPath() Steps to find a shortest, cyclic path between two vertices.

// Find a shortest, cyclic path between two actors

// In this case, we add cyclicPath() much later in the query.  If we
// were to specify cyclicPath() in the same location as we had specified
// simplePath(), we would find that we would no recurse at all, because
// there are no cyclic paths available to traverse at the first recursion
// step.  Instead, we allow recursion into cycles, but we add a limit to
// control how long we're willing to spin.  Later, we add the cyclicPath()
// step to filter down to ony cyclic paths.
g.V()
  .has("person","name","Johnny Depp")
  .repeat(both("actor").timeLimit(2000))
  .until(has("person","name","Leonardo DiCaprio"))
  .path()
  .by(choose(hasLabel("person"),
             values("name"),
             constant("--> M -->")))
  .cyclicPath()
  .limit(1)

// Sample output:
// [Johnny Depp, --> M -->, Johnny Depp, --> M -->,
//  Kate Winslet, --> M -->, Leonardo DiCaprio]

Using a Path Traversal with the path() and where() Steps to find a path that meets some criteria.

// Find a simple path of a particular length between two actors

// Find a path containing nine path objects.
g.V()
  .has("person","name","Johnny Depp")
  .repeat(both("actor").simplePath().timeLimit(200000))
  .until(has("person","name","Leonardo DiCaprio"))
  .path()
  .by("name")
  .by(constant("--> M -->"))
  .where(count(local).is(eq(9)))
  .limit(1)

// Sample output:
// [Johnny Depp, --> M -->, Geoffrey Rush, --> M -->,
//  Orlando Bloom, --> M -->, Bernard Hill, --> M -->,
//                                  Leonardo DiCaprio]

The by Step Modulator

Step modulator Description
by(), by(propertyKey), by(traversal), by(order), by(function), by(comparator), … Supply additional parameters to a preceding step. The by modulator is commonly used with steps aggregate, dedup, group, groupCount, order, path, select, tree, and more. When used with the path step, by specifies which value to project for each element in a path, such as a property value, label, id, etc.

Computing Aggregates

count([scope]) Compute a single aggregate value from all incoming numbers.

sum([scope]) The default scope is global with computation being applicable to the traversal stream.

mean([scope]) The alternative scope is local with computation being applicable to each collection-type object in the traversal stream.

min([scope]) max([scope])

g.V().has("movie","title","Alice in Wonderland").count()
// Sample output: 2

g.V().has("movie","title","Alice in Wonderland").count(local)
g.V().has("movie","title","Alice in Wonderland").local(count())
// Sample output:  1  1

Projecting Traversal Steps

Traversal steps can be labeled or marked at specific locations, and then later the traversal can use traversers from previous locations in the path history to project/select values.

Step Description
select(stepLabel,…) Select labeled steps within a path. Steps in a traversal are labeled using the as() step modulator.
where(stepLabel, predicate) Retain or drop a traverser based on the predicate that is evaluated with respect to the labeled step. Steps in a traversal are labeled using the as() step modulator.

Both select and where have variations that do not require projecting traversers from previous locations in a path, such as select(keys|values), where(predicate), and where(traversal).

Both as() and by() are used with a number of different steps as modulators.

Step modulator Description
as(stepLabel,…) Label a step in a traversal. Any step can have one or more labels associated with it. The as modulator is commonly used with steps match, path, select, and where.
by(), by(propertyKey), by(traversal), by(order), by(function), by(comparator), … Supply additional parameters to a preceding step. The by modulator is commonly used with steps aggregate, dedup, group, groupCount, order, path, select, tree, and more. When used with the select step, by specifies which value to project for each selected element, such as a property value, label, id, etc.

Using the select() Step to project a traversal.

// Find Johnny Depp’s movie titles

g.V()
  .has("person","name","Johnny Depp").as("actor")
  .in("actor").as("movie")
  .select("actor","movie").by("name").by("title")
  .sample(1)

// Sample output:
// [actor:Johnny Depp, movie:Alice in Wonderland]

// Find Johnny Depp’s movie titles, years, and genres
g.V()
  .has("person","name","Johnny Depp").as("a")
  .in("actor").as("t","y")
  .out("belongsTo").as("g")
  .select("a","t","y","g")
    .by("name")
    .by("title")
    .by("year")
    .by("name")
  .sample(3)

// Sample output:
// [a:Johnny Depp, t:Alice in Wonderland, y:2010, g:Fantasy]
// [a:Johnny Depp, t:Alice in Wonderland, y:2010, g:Animation]
// [a:Johnny Depp, t:Alice in Wonderland, y:2010, g:Adventure]

// Find Johnny Depp’s movie titles, years, and average ratings

g.V()
  .has("person","name","Johnny Depp").as("a")
  .in("actor").as("t","y","r")
  .select("a","t","y","r")
    .by("name")
    .by("title")
    .by("year")
    .by(inE("rated").values("rating").mean())
  .sample(1)

// Sample output:
// [a:Johnny Depp, t:Alice in Wonderland, y:2010, r:6.0]

// Find directors who directed a movie with Bruce Willis
g.V()
  .hasLabel("person").as("p")
  .in("director")
  .out("actor")
  .has("person","name","Bruce Willis")
  .select("p")
    .by("name")
  .sample(1)
  
// Find the number of movies with Tom Hanks
g.V()
  .has("person","name","Tom Hanks").as("x")
  .in("actor").as("y")
  .select("y")
  .count(local)

Using the select() Step to project a traversal that has been filtered using where() Step.

// Find directors who appeared in their own movies

g.V()
  .hasLabel("person").as("d")
  .in("director").as("t","y")
  .out("actor").as("a")
  .select("d").where(eq("a"))
  .select("d","t","y")
    .by("name")
    .by("title")
    .by("year")
  .sample(1)

// Sample output:
// [d:Quentin Tarantino, t:Pulp Fiction, y:1994]

// We can also simplify this a bit like this:
g.V()
  .hasLabel("person").as("d")
  .in("director").as("t","y")
  .out("actor").as("a")
  .where("d",eq("a"))
  .select("d","t","y")
    .by("name")
    .by("title")
    .by("year")
  .sample(1)

// Sample output:
// [d:Quentin Tarantino, t:Pulp Fiction, y:1994]

Declarative Traversal Definition

Graph pattern matching defines a graph pattern and matches it against a graph. It focuses on what to compute rather than how to compute.

The match(pattern,...)step defines and matches a set of traversal patterns of the form:

  • as("v1").step()...step().as("v2") // both the start and end of the traversal have declared variables.
  • as("v").step()...step() // only the start of the traversal has a declared variable.
  • step()...step() // there are no declared variables.

Graph pattern matching succeeds if a match-resulting traverser has its labeled path values bound to all the prefix and postfix variables of the match traversal patterns. Variables with the same name appearing in the same or different patterns always bind to the same value.

Using the match() Step to match a traversal.

// Find directors who appeared in their own movies

// We've used the declared variable names, 'm' and 'p' in both 
// match patterns.  When we do this, they will both
// bind to the same value.  So, we're defining two patterns, but
// they are going to apply to the same movies and the same people.
// Note `__` is a Gremlin anonymous traversal.
g.V()
  .match(
      __.as("m").out("actor").as("p"),
      __.as("m").out("director").as("p"))
  .select("p")
    .by("name")
  .dedup()
  .sample(3)

// Sample output:
// Clint Eastwood
// Charles Chaplin
// Quentin Tarantino

// Find directors who appeared in their own Western movies
g.V()
  .match(
    __.as("m").out("actor").as("p"),
    __.as("m").out("director").as("p"),
    __.as("m").out("belongsTo").has("name","Western"))
  .select("p")
    .by("name")
  .dedup()
  .sample(3)

// Sample output:
// Clint Eastwood
// John Wayne
// Quentin Tarantino

// Find directors who appeared in their own Comedy 
// where the director has never participated in a movie that 
// was rated below an 8.
// The `not()` step below has an inner traversal from person
// to movie, and then it fetches the mean rating for each movie,
// and determins if that mean is less than 8.  If any of those
// traversals return true, then the not negates it, and the pattern
// is not matched.
g.V()
  .match(
    __.as("m").out("actor").as("p"),
    __.as("m").out("director").as("p"),
    __.as("m").out("belongsTo").has("name","Comedy"),
    __.not(
      __.as("p")
        .in()
        .local(
          inE("rated")
            .values("rating")
            .mean()
            .is(lt(8.0))
          )
        )
  )
  .select("p")
    .by("name")
  .dedup()
  .sample(3)

// Sample output:
// Buster Keaton
// Charles Chaplin
// Roberto Benigni

Using the match() Step to match a traversal with and(), or(), and not(). These steps are documented in the Logical And Or Not Steps section.

// Find actors who appeared in 10+ movies, one of which is 
// "The Matrix" or "Die Hard", and if the movie is "Die Hard", 
// the actor is not Alan Rickman.
g.V()
  .match(
    __.as("a")
      .in("actor")
      .count()
      .is(gte(10)),
    or(
      __.as("a")
        .in("actor")
        .has("title","The Matrix"),
      __.as("a")
        .in("actor")
        .has("title","Die Hard")
        .and()
        .not(
          __.as("a")
            .has("name","Alan Rickman")
        )
    )
  )
  .select("a")
  .by("name")

// Sample output:
// Keanu Reeves
// Bruce Willis

// Find directors who have made at least 5 Adventure movies,
// which have been rated at least 8.0, and return the name of 
// the movie, along with the director.
g.V()
  .match(
    __.as("d").in("director").out("belongsTo").has("name","Adventure").count().as("count"),
    __.as("count").is(gte(5))
  )
  .select("d")
  .in("director")
  .match(
     __.as("m").out("belongsTo").has("name","Adventure"),
     __.as("m").inE("rated").values("rating").mean().as("avg"),
     __.as("avg").is(gte(8.0))
   )
  .select("m","d")
    .by("title")
    .by("name")
// [m:The Lord of the Rings: The Two Towers, d:Peter Jackson]
// [m:The Lord of the Rings: The Return of the King, d:.....]
// [m:The Lord of the Rings: The Fellowship of the Ring, d:.]

Logical And Or Not Steps

Filtering traversals based on logical conjunction, disjunction, and negation

Step Description
not(traversal) Passes an incoming traverser to the next step if the internal traversal yields no results.
and(traversal,...) Passes an incoming traverser to the next step if every internal traversal yields a result.
or(traversal,...) Passes an incoming traverser to the next step if at least one internal traversal yields a result.

Subgraph Traversals

Extracting a subgraph from a larger graph for analysis, visualization, etc. Traverses a graph to find vertices and edges for extraction Generates an edge-induced subgraph based on edges in a traversal stream.

The subgraph(subgraphLabel) step generates an edge-induced subgraph from edges in the traversal stream. The subgraph is produced as a side-effect, labeled data structure that can be used for independent traversals. An edge-induced subgraph is formed by a subset of the edges and their endpoint vertices from an original graph. All respective edge and vertex properties are also copied into a subgraph.

The cap(sideEffectLabel,...) Step iterates the traversal up to itself and emits the side-effect(s) referenced by the provided label(s).

The iterate() Step iterates the traversal up to itself and stops the traversal without emitting any objects.

Using the subgraph() Step to extract a sub-graph.

socialGraph = 
  g.E()
    .hasLabel("knows")
    .subgraph("socialGraph")
    .cap("socialGraph")
    .next()
sg = socialGraph.traversal()

sg.V()
  .groupCount()
  .by(label)
// [user:1079]

sg.E()
  .groupCount()
  .by(label)
// [knows:3500]

// Notice that not all users in the original graph are connected by "knows"
// edges.  Consequently, they will not appear in the edge-induced subgraph:
g.V()
  .hasLabel("user")
  .not(bothE("knows"))
  .count() //21
// Extract a subgraph of immediate social connections for a particular user
u1Graph = 
  g.V()
    .has("user","userId","u1")
    .bothE("knows")
    .subgraph("u1Graph")
    .cap("u1Graph")
    .next()

u1g = u1Graph.traversal()

u1g.V().groupCount().by(label)
// [user:6]

u1g.E().groupCount().by(label)
// [knows:5]
// Extract a subgraph with immediate and surrounding social connections 
// for a particular user
// Note that we are repeatedly calling `subgraph()`, and with each call
// the subgraph grows.
u1Graph3 = 
  g.V()
    .has("user","userId","u1")
    .repeat(__.bothE("knows").subgraph("u1Graph3").otherV())
    .times(3)
    .cap("u1Graph3")
    .next()

u1g3 = u1Graph3.traversal()

u1g3.V().groupCount().by(label)
// [user:195]

u1g3.E().groupCount().by(label)
// [knows:240]
// Extract multiple subgraphs (version 1)
graphs = 
  g.V()
    .hasLabel("user")
    .bothE("knows")
    .subgraph("socialGraph")
    .bothV()
    .outE("rated")
    .subgraph("movieGraph")
    .inV()
    .outE()
    .subgraph("movieGraph")
    .cap("socialGraph","movieGraph")
    .next()
    
sg = graphs["socialGraph"].traversal()
mg = graphs["movieGraph"].traversal()

sg.V().groupCount().by(label) // [user:1079]
sg.E().groupCount().by(label) // [knows:3500]
mg.V().groupCount().by(label) // [movie:920, person:8759, genre:18, user:1079]
mg.E().groupCount().by(label) // [actor:10678, screenwriter:1668, rated:47082, 
                              //  director:1061, composer:1047, 
                              //  cinematographer:961, belongsTo:2045]
                              
// Extract multiple subgraphs (version 2)
t = 
  g.V()
    .hasLabel("user")
    .bothE("knows")
    .subgraph("socialGraph")
    .bothV()
    .outE("rated")
    .subgraph("movieGraph")
    .inV()
    .outE()
    .subgraph("movieGraph")
    .iterate()
sg = t.sideEffects.get("socialGraph").get().traversal()
mg = t.sideEffects.get("movieGraph").get().traversal()

sg.V().groupCount().by(label) // [user:1079]
sg.E().groupCount().by(label) // [knows:3500]
mg.V().groupCount().by(label) // [movie:920, person:8759, genre:18, user:1079]
mg.E().groupCount().by(label) // [actor:10678, screenwriter:1668, rated:47082, 
                              //  director:1061, composer:1047, 
                              //  cinematographer:961, belongsTo:2045]

Statistical Traversals

Traversing and collecting statistics about a graph

  • Computes aggregates, performs analysis
  • Summarizes a graph with one or more metrics

Compute a single aggregate value representing:

Step Description
count([scope]) the total number of all objects
sum([scope]) the sum of all numbers
mean([scope]) the average of all numbers
min([scope]) the smallest number
max([scope]) the largest number

The default scope is global with computation being applicable to the traversal stream. The alternative scope is local with computation being applicable to each collection-type object in the traversal stream.

Using the mean(), min(), and max() Steps to calculate aggregates.

// Find the average, smallest, and largest user ages

g.V().hasLabel("user").values("age").mean()
//31.28181818181818

g.V().hasLabel("user").values("age").min()
//12

g.V().hasLabel("user").values("age").max()
//65

Compute custom aggregates:

Step Description
fold(), fold(seed,function) Aggregate all objects in a traversal stream into a list and return the list or an aggregate value computed by the reduce function.

The inverse of the fold() step is the unfold() step.

Using the fold() Step to calculate custom aggregates.

// Find the total duration of all movies in the graph

g.V().hasLabel("movie").values("duration").sum()
//106358

// Note that this is not as performant as a `sum()` step.
g.V().hasLabel("movie").values("duration").fold(0) {x,y -> x+y}
//106358

// Note that this is not as performant as a `sum()` step.
g.V().hasLabel("movie").values("duration").fold(0,sum)
//106358

The group().by().by() pattern:

Step Description
group([mapLabel]) Organize objects in a traversal stream into groups based on a known object characteristic and return a (labeled) map of key-objects pairs. Common usage pattern: group().by( what-to-group-on ).by( how-to-represent-an-object-in-a-group ). Aggregation is possible with the second by step modulator or with additional steps.

Using the group() and by() steps to create a statistical traversal using the group().by().by() pattern

// Find the average ages of female and male users in the graph

g.V()
  .hasLabel("user")
  .group()
  .by("gender")             // group by gender
  .by(values("age").mean()) // extract the mean age for each gender

// Sample output:
// [F:30.561371841155236, M:32.01282051282051]

Counting the number of elements in a group:

Step Description
groupCount([mapLabel]) Designate objects in a traversal stream to groups based on a known object characteristic, count them and return a (labeled) map of key-count pairs. Common usage pattern: groupCount().by( what-to-group-on ).

Step groupCount() and traversal group().by().by(count()) are semantically equivalent but the former is optimized for performance.

Using the groupCount() step to create a count the number of elements in a group.

// Find the vertex and edge distribution by label in the graph

g.V().groupCount().by(label)
// [movie:920, person:8759, genre:18, user:1100]

g.E().groupCount().by(label)
// [actor:10678, screenwriter:1668, director:1061,
//  composer:1047, cinematographer:961,
//  rated:48094, belongsTo:2045, knows:3500]

// Find the user distribution by age
g.V().hasLabel("user").groupCount().by("age")

// [12:29, 13:29, 14:34, 15:38, ..., 63:11, 64:15, 65:8]

Using the groupCount() step with the cap() step to create a centrality traversal.

// Find the most important vertices by walking through all 
// vertices in the graph and ordering them by centrality.

// Every time the groupCount is repeated, the results are put in the same map.
// We're not grouping on anything in particular, instead, we're just counting
// the number of times we see each vertex during the traversal.  We then traverse
// to all adjacent vertices.  We do this 30 times, a number which must be large 
// enough to converge on the same centrality ordering.  The cap() step forces the
// repetition traversal to complete.  We then order by the vertex->count map
// by count descending.  The `select(keys)` step selects all keys in the groupCount
// map (the vertices).  This returns a collection of keys.  The `unfold()` step
// allows us to work on the elements of that set.  We then get either the title and
// year if the vertex is a movie, otherwise we get the vertex name if the vertex is
// a person, user, or genre.
g.V()
  .repeat(groupCount("m").both())
  .times(30)
  .cap("m")
  .order(local).by(valueDecr)
  .select(keys)
  .unfold()
  .choose(hasLabel("movie"),
          valueMap("title","year"),
          values("name"))
// Sample output:
// Comedy
// Drama
// ...
// [year:[1994], title:[Forrest Gump]]
// ...
// [year:[1972], title:[The Godfather]]
// [year:[1993], title:[Schindler's List]]
// ...

Ordering Traversals

Ordering traversals arranging objects in a traversal stream into an ordered sequence

  • Orders incoming objects based on a known criterion
  • Emits objects one-by-one in their new order

The order([scope]) step sort all objects in a traversal stream based on a known criterion and emit objects one-by-one in their new order.

Common usage pattern: order().by( what-to-sort-on , incr | decr | keyIncr | keyDecr | valueIncr | valueDecr | shuffle )

The default sorting order is incr(easing). The default scope is global with computation being applicable to the traversal stream. The alternative scope is local with computation being applicable to each collection-type object in the traversal stream.

Using the order() step with the by() step to order a traversal.

// Order Nicolas Cage’s movies by year decr

g.V()
  .has("person","name","Nicolas Cage")
  .in("actor")
  .order()
  .by("year",decr)
  .valueMap("year","title")

// Sample output:
// [year:[2007], title:[National Treasure: Book of Secrets]]
// [year:[2007], title:[Ghost Rider]]
// ...

// Order Nicolas Cage’s movies by year decr and title incr

g.V()
  .has("person","name","Nicolas Cage")
  .in("actor").
  .order()
  .by("year",decr)
  .by("title",incr)
  .valueMap("year","title")

// Sample output:
// [year:[2007], title:[Ghost Rider]]
// [year:[2007], title:[National Treasure: Book of Secrets]]
// [year:[2004], title:[National Treasure]]
// [year:[2000], title:[Gone in 60 Seconds]]
// [year:[1998], title:[City of Angels]]
// [year:[1997], title:[Con Air]]
// [year:[1997], title:[Face/Off]]
// [year:[1996], title:[The Rock]]
// [year:[1995], title:[Leaving Las Vegas]]
// [year:[1984], title:[The Cotton Club]]

// Order Nicolas Cage’s movies by average age incr of users who left 5.0+ ratings

g.V()
  .has("person","name","Nicolas Cage")
  .in("actor")
  .order()
  .by(__.inE("rated").has("rating",gte(5.0))
        .outV()
        .values("age")
        .mean(),
      incr)
  .valueMap("year","title")

// Sample output:
// [year:[2007], title:[National Treasure: Book of Secrets]]
// [year:[2004], title:[National Treasure]]
// [year:[2007], title:[Ghost Rider]]
// [year:[1995], title:[Leaving Las Vegas]]
// [year:[1998], title:[City of Angels]]
// [year:[1997], title:[Con Air]]
// [year:[2000], title:[Gone in 60 Seconds]]
// [year:[1984], title:[The Cotton Club]]
// [year:[1997], title:[Face/Off]]
// ...

// Find and order the distribution of Nicolas Cage’s movies by year decr

g.V().has("person","name","Nicolas Cage").in("actor").
  groupCount().by("year").
  order(local).by(keyDecr)

// Sample output:
// [2007:2,
//  2004:1,
//  2000:1,
//  1998:1,
//  1997:2,
//  1996:1,
//  1995:1,
//  1984:1]

Using the order() step with the by() step to randomize the order of a traversal.

// Randomize the order of Nicolas Cage’s movies

g.V()
  .has("person","name","Nicolas Cage")
  .in("actor")
  .order()
  .by(shuffle)
  .valueMap("year","title")

// Sample output:
// [year:[2000], title:[Gone in 60 Seconds]]
// [year:[1998], title:[City of Angels]]
// [year:[2007], title:[Ghost Rider]]
// [year:[2004], title:[National Treasure]]
// [year:[1996], title:[The Rock]]
// [year:[1997], title:[Face/Off]]
// [year:[1997], title:[Con Air]]
// [year:[2007], title:[National Treasure: Book of Secrets]]
// [year:[1984], title:[The Cotton Club]]
// [year:[1995], title:[Leaving Las Vegas]]

Mutating Traversals

A mutating traversal is a traversal that changes the graph structure

  • Traverses and mutates a graph structure
  • Adds or removes vertices, edges, and properties

Adding and removing vertices, edges, and properties:

Step Description
addV(label) Create a new labeled vertex for each incoming object in a traversal stream.
addE(label) Create a new labeled edge for each incoming vertex in a traversal stream. For every edge, while one endpoint corresponds to an incoming vertex in the traversal stream, the other endpoint and edge direction are specified by step modulator from or step modulator to; if the other endpoint is not specified, the edge is a loop, connecting the same vertex twice.
property([cardinality],key,value) Create a property (also, multi-property and meta-property) with the known property key and property value.
drop() Remove all vertices, edges, and properties seen in a traversal stream from a graph. Dropping a vertex also removes its all incident edges.

Specifying essential parameters for step addE:

Step modulator Description
from(stepLabel), from(traversal) Specify that a new edge created by a preceding step addE must be directed from a vertex returned by the labeled step or internal traversal.
to(stepLabel), to(traversal) Specify that a new edge created by a preceding step addE must be directed to a vertex returned by the labeled step or internal traversal.

Using the property() step to add a new property to a vertex or edge.

// Computing and storing average ratings as properties of Tom Hanks' movies
g.V()
  .has("person","name","Tom Hanks")
  .in("actor")
  .property("avg",
            __.inE("rated").values("rating").mean())
  .valueMap("avg","title")

// Sample output:
// [avg:[7.097222222222222], title:[Cast Away]]
// [avg:[7.376623376623376], title:[Philadelphia]]
// [avg:[5.615384615384615], title:[The Polar Express]]
// [avg:[7.780487804878049], title:[Saving Private Ryan]]
// [avg:[7.842696629213483], title:[The Green Mile]]
// [avg:[8.213483146067416], title:[Forrest Gump]]
// ...

// Adding a new genre for Biography movies
g.addV("genre")
  .property("genreId","g19")
  .property("name","Biography")
  .valueMap()

// Sample output:
// [genreId:[g19], name:[Biography]]

Using the property(), addV() and addE() steps to add a new property to a vertex or edge.

// Adding a new Biography movie with Tom Hanks
g.addV("movie")
  .property("movieId","m1000")
  .property("title","Bridge of Spies")
  .property("year","2015")
  .property("duration","135")
  .property("country","United States")
  .as("new")
  .addE("actor")
  .to(__.V().has("person","name","Tom Hanks"))
  .select("new")
  .addE("belongsTo")
  .to(__.V().has("genre","name","Biography"))
  
// Materialize Tom Hanks' social connections
g.V()
  .has("person","name","Tom Hanks")
  .as("tom")
  .in("actor")
  .out("actor")
  .where(neq("tom"))
  .dedup()
  .addE("knows")
  .from("tom")
  .inV()
  .values("name")

// Sample output:
// Jean Reno
// Meg Ryan
// ...

Using addE() and drop() steps to migrate a graph.

// Converting in("actor") edges to out("actedIn") edges for Tom Hanks
g.V()
  .has("person","name","Tom Hanks")
  .as("tom")
  .in("actor")
  .as("movie")
  .addE("actedIn")
  .from("tom")
  .outV()
  .inE("actor")
  .drop()

g.V()
  .has("person","name","Tom Hanks")
  .in("actor")
  .count()
// 0

g.V()
  .has("person","name","Tom Hanks")
  .out("actedIn")
  .count()
// 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment