GTFS Example in One Piece

Introduction

GTFS (General Transit Feed Specification) defines a common format for public transportation schedules and associated geographic information. It’s super useful for travel websites and trip planners.

We’ll use Neo4j to showcase how quickly and easily this standard can be implemented in a solution.

This is basically a copy paste from the original post from Rik Van Bruggen’spost: http://blog.bruggen.com/2015/11/loading-general-transport-feed-spec.html , but I never found a working online example out there, so I created one to use to experiment graph queries.

The GTFS info used is the one provided by the Buenos Aires metro (since I plan to go there in the near future): http://transitfeeds.com/p/subterraneos-de-buenos-aires/541/latest

What we’ll be doing is in no way optimized. Any suggestions are welcome.

Setup

GTFS is a collection of zipped text files (csv-like format) as specified here: https://developers.google.com/transit/gtfs/ Neo4j has a very convenient way to upload csv through the "load csv" command.

Let’s load a sample set:

1. Load the Agency Info

load csv with headers from
  'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/agency.txt' as csv
create
  (a:Agency {id: csv.agency_id, name: csv.agency_name, url: csv.agency_url, timezone: csv.agency_timezone});

2. Load the Routes Info

load csv with headers from
  'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/routes.txt' as csv
  match (a:Agency {id: csv.agency_id})
  create
  (a)-[:OPERATES]->(r:Route {id: csv.route_id, short_name: csv.route_short_name, long_name: csv.route_long_name, type: toInt(csv.route_type)});

3. Load the Trips Info

load csv with headers from
  'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/trips.txt' as csv
   match (r:Route {id: csv.route_id})
   merge (r)<-[:USES]-(t:Trip {id: csv.trip_id, service_id: csv.service_id});

4. Load the Stops Info

load csv with headers from
  'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/stops.txt' as csv
  create (s:Stop {id: csv.stop_id, name: csv.stop_name, lat: toFloat(csv.stop_lat), lon: toFloat(csv.stop_lon), platform_code: csv.platform_code, parent_station: csv.parent_station, location_type: csv.location_type});

5. Load the Stops Info again

(This time to check if there is any parent-child relationships between stations)

load csv with headers from
  'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/stops.txt' as csv
  with csv
  where not (csv.parent_station is null)
  match (ps:Stop {id: csv.parent_station}), (s:Stop {id: csv.stop_id})
  create (ps)<-[:PART_OF]-(s);

6. Load the Stop Times Info

load csv with headers from
  'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/stop_times.txt' as csv
  match (t:Trip {id: csv.trip_id}), (s:Stop {id: csv.stop_id})
  create (t)<-[:PART_OF_TRIP]-(st:Stoptime {arrival_time: csv.arrival_time, departure_time: csv.departure_time, stop_sequence: toInt(csv.stop_sequence)})-[:LOCATED_AT]->(s);

7. Update the stop times

match (s:Stoptime)
  set s.arrival_time_int=toInt(replace(s.arrival_time,":",""))/100
  set s.departure_time_int=toInt(replace(s.departure_time,":",""))/100;

8. Update time relationships

match (s1:Stoptime)-[:PART_OF_TRIP]->(t:Trip),
  (s2:Stoptime)-[:PART_OF_TRIP]->(t)
  where s2.stop_sequence=s1.stop_sequence+1
  create (s1)-[:PRECEDES]->(s2);

OVERVIEW OF ENTITIES

Overview of entities and how they are related:

MATCH (a)-[r]->(b) WHERE labels(a) <> [] AND labels(b) <> []
RETURN DISTINCT head(labels(a)) AS This, type(r) as To, head(labels(b)) AS That LIMIT 10

TRIP SEARCH EXAMPLE

Finding the trips and itineraries between Callao and Corrientes:

match (a:Stop),(t:Stop)
  where a.name starts with 'Callao' AND t.name starts with 'Corrientes'
  with t,a
  match p = allshortestpaths((t)-[*]-(a))
return p
limit 10

Conclusions

Resources

Created by 54chi - Twitter | http://54chi.com |

54chi/Neo4J_GTFS_Buenos_Aires_Subte.adoc