GTFS (General Transit Feed Specification) defines a common format for public transportation schedules and associated geographic information. It’s super useful for travel websites and trip planners.
We’ll use Neo4j to showcase how quickly and easily this standard can be implemented in a solution.
This is basically a copy paste from the original post from Rik Van Bruggen’spost: http://blog.bruggen.com/2015/11/loading-general-transport-feed-spec.html , but I never found a working online example out there, so I created one to use to experiment graph queries.
The GTFS info used is the one provided by the Buenos Aires metro (since I plan to go there in the near future): http://transitfeeds.com/p/subterraneos-de-buenos-aires/541/latest
What we’ll be doing is in no way optimized. Any suggestions are welcome.
GTFS is a collection of zipped text files (csv-like format) as specified here: https://developers.google.com/transit/gtfs/ Neo4j has a very convenient way to upload csv through the "load csv" command.
Let’s load a sample set:
load csv with headers from
'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/agency.txt' as csv
create
(a:Agency {id: csv.agency_id, name: csv.agency_name, url: csv.agency_url, timezone: csv.agency_timezone});
load csv with headers from
'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/routes.txt' as csv
match (a:Agency {id: csv.agency_id})
create
(a)-[:OPERATES]->(r:Route {id: csv.route_id, short_name: csv.route_short_name, long_name: csv.route_long_name, type: toInt(csv.route_type)});
load csv with headers from
'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/trips.txt' as csv
match (r:Route {id: csv.route_id})
merge (r)<-[:USES]-(t:Trip {id: csv.trip_id, service_id: csv.service_id});
load csv with headers from
'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/stops.txt' as csv
create (s:Stop {id: csv.stop_id, name: csv.stop_name, lat: toFloat(csv.stop_lat), lon: toFloat(csv.stop_lon), platform_code: csv.platform_code, parent_station: csv.parent_station, location_type: csv.location_type});
(This time to check if there is any parent-child relationships between stations)
load csv with headers from
'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/stops.txt' as csv
with csv
where not (csv.parent_station is null)
match (ps:Stop {id: csv.parent_station}), (s:Stop {id: csv.stop_id})
create (ps)<-[:PART_OF]-(s);
load csv with headers from
'https://dl.dropboxusercontent.com/u/1355080/GTFSTest/stop_times.txt' as csv
match (t:Trip {id: csv.trip_id}), (s:Stop {id: csv.stop_id})
create (t)<-[:PART_OF_TRIP]-(st:Stoptime {arrival_time: csv.arrival_time, departure_time: csv.departure_time, stop_sequence: toInt(csv.stop_sequence)})-[:LOCATED_AT]->(s);
match (s:Stoptime)
set s.arrival_time_int=toInt(replace(s.arrival_time,":",""))/100
set s.departure_time_int=toInt(replace(s.departure_time,":",""))/100;
Overview of entities and how they are related:
MATCH (a)-[r]->(b) WHERE labels(a) <> [] AND labels(b) <> []
RETURN DISTINCT head(labels(a)) AS This, type(r) as To, head(labels(b)) AS That LIMIT 10
Finding the trips and itineraries between Callao and Corrientes:
match (a:Stop),(t:Stop)
where a.name starts with 'Callao' AND t.name starts with 'Corrientes'
with t,a
match p = allshortestpaths((t)-[*]-(a))
return p
limit 10
Created by 54chi - Twitter | http://54chi.com |