Skip to content

Instantly share code, notes, and snippets.

@pierredewilde
Created September 21, 2011 13:54
Show Gist options
  • Save pierredewilde/1232082 to your computer and use it in GitHub Desktop.
Save pierredewilde/1232082 to your computer and use it in GitHub Desktop.
Tinkerpop: new tickets proposition
[Tinkerpop] =========================================================
[Tinkerpop] Promotion
blog posts, project acquisitions, EC2 images (Stephen style), contact hosting services to host TinkerPop deployments, etc.
[Tinkerpop] More test cases and cleaner/concise documentation
[Tinkerpop] More control over our Hudson build
Or at least, getting someone to tweak the configuration of our projects so as to auto-deploy on successful builds. (Josh)
[Blueprints] ========================================================
[Blueprints] Forkable/mergeable version-controlled graphs
The GraphML normalization feature lets you check in graphs, update them,
and roll back to previous versions. However, forking and merging graph invites
ID collisions unless done in carefully controlled ways. I'm working
on an extension of this feature which will make collaborative
graph-building fairly effortless. Graph versioning doesn't seem to
have made the Blueprints release highlights, but I happen to think
it's "big". (Josh)
[Blueprints] Fix all consistency issues between different vendors
[Blueprints] Benchmarking
Be great to have a cool blog post on TinkerPop benchmarks --- perhaps collaborate with Alex's Harvard buddy working on this topic.
[Blueprints] Move to Neo4j Automatic Indices
[Blueprints] Ride InfiniteGraph for their distribution
[Blueprints] Blueprints BSP
Stephen and I have an upcoming project that will require a distributed graph engine (e.g. Pregel, GoldenOrb, Hama, Giraph, style). We would like to use something like GraphBase [ https://github.com/dgreco/graphbase ]. However, we may need to support BSP semantics in some BulkSynchrousGraph extends Graph interface? Perhaps work with Alex Averbuch and Russel Jurney on this. (Marko)
What is the project and what are the use cases?
You want to use something HDFS-based (GraphBase, GoldenOrb, etc) so I guess you want to support large scale (large latency) batch graph processing via TinkerPop? (Alex)
My biggest interest lies in studying these big graph processing frameworks (e.g. Pregel) bottom up - starting a PhD on the topic - so I'm more drawn to the researchy side: evaluating existing systems and designing better ones.
This is firmly in the "evaluating existing systems" (top down integration) category, so I'd love to take part in benchmarking the implementation(s) on our cluster at work (we have Hadoop running already)... and I think I can get a masters student to do a lot of it. (Alex)
[Pipes] =============================================================
[Pipes] do not upload .zip in pom.xml
Make it so the pom.xml doesn't upload the .zip on 'mvn deploy' (Marko)
I can't figure out why it does that for Pipes and Rexster and not the others.... (Marko)
[Pipes] True LoopPipe
Fix LoopPipe once and for all. I suspect Josh would be the best to make this implementation correct. Josh? (Marko)
Aggregate and Loop are not in perfect harmony. In essence, LoopPipe needs a new architecture ... the current instantiation is slow and has "funny" semantics. (Marko)
http://groups.google.com/group/gremlin-users/browse_thread/thread/a87d2958f42c9db/3c8e41263abf8635
[Pipes] BSP semantics
I have trouble envisioning how you would make use of Pipes when integrating with something like GraphBase, GoldenOrb, etc. As I understand Pipes it's Gremlin, compiled into a GRAPH DATABASE optimized program. When working with something Pregel-like the nature of the program is much different, more like a Hadoop Job. As such, I find it easier to imagine Gremlin being compiled to a "GraphBaseJob" or similar, then being passed to GraphBase via a GraphBaseGraph. I realize there's an abstraction missing in between, but I don't think that abstraction looks like the current Pipes... (in my mind) Pipes is more an IMPLEMENTATION of an abstract query engine, and we need to identify the commonalities between the query engines of these distributed graph processing systems to build a query engine interface, e.g. DistributedPipes. (Alex)
Again, I think this is more of an "evaluating existing systems" problem. I've never played with any of these systems yet, so I don't know what their interfaces have in common. But it's a perfect masters thesis opportunity, and one that would really help me with my work. (Alex)
I think Pipes may be a good way to stream data into and out of a BSP job but would not be at all useful within the BSP job itself.
In fact, it might be cool to build a set of Pipes that wrap a variety of black box operations like BSP, various graph query languages or even a pipe that represents a Gremlin expression. I think any operation that produces graph elements would be useful to wrap in an iterator that can be used as a Pipes data source, but serious bonus points if you could come up with a sensible way to make them full-fledged pipes that could use the incoming data stream as a source for the wrapped operation as well. (Darrick)
[Gremlin] ===========================================================
[Gremlin] Stub a new Gremlin
perhaps Gremlin(Jython)? /// in a non-active repository (Marko)
[Gremlin] Gremlin_scala
Work with Zach Cox to push out Gremlin_scala. Moreover, work with Josh to get a parent pom structure for Gremlin and all the JVM language implementations. (Marko)
[Gremlin] Gremlin_jython
Perhaps work with James for a Gremlin_jython ? (Marko)
[Rexster] ===========================================================
[Rexster] Jackson for JSON processing?
I think we need to factor out jettison in favor of jackson for json processing. not stoked about that, of course. (Stephen)
[Rexster] Dynamic configuration
Add and remove graphs on-the-fly (Pierre)
Yea. Like CREATE and DROP in SQL. (Marko)
[Rexster] RexsterConsole as separated package
Make a RexsterConsole that is a separate package from Rexster so people wanting to talk to their Rexster distribution don't need FULL Rexster. (Marko)
[Pipes] do not upload .zip in pom.xml
Make it so the pom.xml doesn't upload the .zip on 'mvn deploy' (Marko)
I can't figure out why it does that for Pipes and Rexster and not the others.... (Marko)
[Rexster] JVM binary protocol over RexPro
[Rexster] Refactor Rexster Kibble packaging
Redo the Rexster Kibble packaging to be more like Blueprints (Stephen)
[Rexster] Consistency
I think bringing that consistency from blueprints over to rexster will be important, which will further encompass expanding integration tests to enforce those blueprints consistency rules. (Stephen)
[Rexster] Housekeeping
I also have a fair amount of housekeeping to do with Rexster and Kibbles (want to tighten some implementation details).
[Rexster] Performance improvement
I've been thinking about looking at rexster performance, but i'm not sure what the scope of that work will look like at the moment.
[Rexster-Kibble] D2R-like data publishing tool
I have suggested it, others have suggested it, time to do it.
I'm particularly interested in existing data sets currently stored
in Blueprints-compatible graph databases which might be good targets
for publishing. There has been enough interest in Blueprints Sail
on both sides of the GraphDB-SemWeb divide that I believe this tool
would see some uptake, as well. (Josh)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment