Identify Seattle bus stops that are within 1 hour of the Food Bank on public transportation.
- From where in the city is the Food Bank accessible within 1 hour?
- Where in the city is accessible from the Food Bank within 1 hour?
GTFS: https://developers.google.com/transit/gtfs/
Mostly interested in:
Get usable data as quickly as possible Tooling needs to be flexible enough to operate on GTFS data with reasonable constraints (ie discrete period of time) End to end, should be able to process and query a day of data in ~1 hour
Python - easy for scripting and processing tabular data Neo4j -- because I thought this would be an interesting application for graph databases QGIS -- for visualization
Route Two: \>(T)---->(D)--------->(A)
/
Route One: (D)--------->(A)-->(D)--------->(A)
Three types of events (nodes) in our system:
- Departures - a route instance leaves a stop (1:1)
- Arrivals - a route instance arrives at a stop (1:1)
- Transfer - leave one route to go to another that connects at the same stop
Nodes have:
- Type
- Time
- Location
- Route
- Stop Name
Initiallly 2 Types of relationship:
- Within a route. (D)--->(A)------>(D)
- Between routes. (A)>(T)---->(D)
-->(D)
Notice the direction of these relationships? They all have a direction.
Relationships also have a magnitude. The number of seconds between each event.
Relationships have:
- Node 0
- Node 1
- Magnitude
Convert GTFS data from the raw feeds into something that looks like our data
We end up with our nodes and our relationships.
For convenience, we have:
- arrivals.txt
- departures.txt
- route_relationships.txt
- cross_route_relationships.txt
Sample Arrival:
32884061-1491443772-arrival 2017-04-05 18:56:12-07:00 1491443772 47.655735 -122.143089 B Line 148th Ave NE & NE 51st St
Sample Route Relationship:
32884061-1491443772-arrival 32884061-1491443772-departure 0
Used Neo4j Python library to import each data set into my local db using the graph query language Cypher.
Why do all this work? The whole point of loading the data into the db is so we can query for data dynamically.
Remember our initial question?
At a particular time, on a specific data, from where in the city can we reach the Food Bank within x amount of time?
Wrote another set of scripts for this. In the end we get a flat file that tells us about the origin, destination and the travel time between them.
You've gotta make the data structure work for you. Originally, I only created relationships in the "to" category. The db was very good at traversing the graph to get the next stop in the network, but very slow when asked to get stops upstream. The relationships are one direction, so from a given stop, it's really hard to find the stops that came before. To get around this, I added a relationship class "from" that mirrors "to" relationships but reverses their direction.
Fold those into our queries and we have a system that's just as good at traversing the network in reverse.
We want to know "from where" for a given stop b/w a particular time.
First we start by finding an arrival event at a particular stop. So let's find all arrival
at Martin L King Jr Way S & S Webster St
between Saturday, May 6, 2017 9:30:00 AM GMT-07:00 DST
and Saturday, May 6, 2017 2:00:00 PM GMT-07:00 DST
.
match (n)
where n.stopName = '"Martin L King Jr Way S & S Webster St"' AND
n.timeStamp >= "1494088200" AND
n.timeStamp <= "1494104400" AND
n.type = "arrival"
return n;
Let's isolate a single event:
match (n)
where n.id = "33363087-1494092722-arrival"
return n;
Now let's find all the paths that reach a given arrival within 15 minutes.
MATCH p = ({id:"33363087-1494092722-arrival"})-[:From *1..50]->({type:"departure"})
WITH p,reduce(s = 0, r IN relationships(p) | s + r.seconds) AS dist
where dist < 900
RETURN p;
And for ease in the export, we can just get the nodes.
MATCH p = ({id:"33363087-1494092722-arrival"})-[:From *1..50]->({type:"departure"})
WITH p,reduce(s = 0, r IN relationships(p) | s + r.seconds) AS dist
where dist < 900
RETURN nodes(p)[-1] AS n, dist ORDER BY dist DESC;