Skip to content

Instantly share code, notes, and snippets.

@s2t2
Last active December 28, 2018 19:41
Show Gist options
  • Save s2t2/0d2929e0ecaba85823e1314935e7941e to your computer and use it in GitHub Desktop.
Save s2t2/0d2929e0ecaba85823e1314935e7941e to your computer and use it in GitHub Desktop.
question about shore line east gtfs data posted 11/23/2016
#
# I would expect entries in the stop_times.txt file to be unique when grouped by `trip_id` and `stop_id`.
#
# Indeed, of the current file's 520 entries, 510 are unique, but 10 are not (5 groups of 2).
#
# The following data structure contains entries from stop_times.txt grouped by `trip_id` and `stop_id`.
#
# My question is, are these entires duplicative or do they have some significance?
#
# Thank you!
#
{
"1610-NHV"=> [
{"trip_id"=>"1610",
"arrival_time"=>"9:12:00",
"departure_time"=>"9:12:00",
"stop_id"=>"NHV",
"stop_sequence"=>"28"},
{"trip_id"=>"1610",
"arrival_time"=>"9:16:00",
"departure_time"=>"9:16:00",
"stop_id"=>"NHV",
"stop_sequence"=>"29"}],
"1633-NHV"=> [
{"trip_id"=>"1633",
"arrival_time"=>"7:00:00",
"departure_time"=>"7:00:00",
"stop_id"=>"NHV",
"stop_sequence"=>"92"},
{"trip_id"=>"1633",
"arrival_time"=>"7:08:00",
"departure_time"=>"7:08:00",
"stop_id"=>"NHV",
"stop_sequence"=>"93"}],
"1637-NHV"=> [
{"trip_id"=>"1637",
"arrival_time"=>"7:36:00",
"departure_time"=>"7:36:00",
"stop_id"=>"NHV",
"stop_sequence"=>"113"},
{"trip_id"=>"1637",
"arrival_time"=>"7:40:00",
"departure_time"=>"7:40:00",
"stop_id"=>"NHV",
"stop_sequence"=>"114"}],
"1640-NHV"=> [
{"trip_id"=>"1640",
"arrival_time"=>"17:44:00",
"departure_time"=>"17:44:00",
"stop_id"=>"NHV",
"stop_sequence"=>"131"},
{"trip_id"=>"1640",
"arrival_time"=>"17:48:00",
"departure_time"=>"17:48:00",
"stop_id"=>"NHV",
"stop_sequence"=>"132"}],
"1644-NHV"=> [
{"trip_id"=>"1644",
"arrival_time"=>"18:05:00",
"departure_time"=>"18:05:00",
"stop_id"=>"NHV",
"stop_sequence"=>"164"},
{"trip_id"=>"1644",
"arrival_time"=>"18:08:00",
"departure_time"=>"18:08:00",
"stop_id"=>"NHV",
"stop_sequence"=>"165"}]}
#
# based on: Shore Line East GTFS data
# source: http://www.my-site.com/gtfs-feed.zip
# modified at: Wed, 23 Nov 2016 09:58:36 EST -05:00
# etag: 1ac9-542737db15840
#
@s2t2
Copy link
Author

s2t2 commented Jan 4, 2017

here are some of these stop_times within context of their entire trip:

trip_id arrival_time departure_time stop_id stop_sequence
1640 16:47:00 16:47:00 STM 126
1640 17:15:00 17:15:00 BRP 127
1640 17:21:00 17:21:00 STR 128
1640 17:27:00 17:27:00 MIL 129
1640 17:34:00 17:34:00 WH 130
1640 17:44:00 17:44:00 NHV 131
1640 17:48:00 17:48:00 NHV 132
1640 17:50:00 17:50:00 ST 133
1640 18:04:00 18:04:00 BRN 134
1640 18:12:00 18:12:00 GUIL 135
1640 18:19:00 18:19:00 MAD 136
1640 18:25:00 18:25:00 CLIN 137
1640 18:31:00 18:31:00 WES 138
1640 18:40:00 18:40:00 OSB 139
1640 19:06:00 19:06:00 NLC 140

I would expect this data to look like this, where the duplicative arrival_time and departure_time are consolidated in to a single stop_time:

trip_id arrival_time departure_time stop_id stop_sequence
1640 16:47:00 16:47:00 STM 126
1640 17:15:00 17:15:00 BRP 127
1640 17:21:00 17:21:00 STR 128
1640 17:27:00 17:27:00 MIL 129
1640 17:34:00 17:34:00 WH 130
1640 17:44:00 17:48:00 NHV 131
1640 17:50:00 17:50:00 ST 132
1640 18:04:00 18:04:00 BRN 133
1640 18:12:00 18:12:00 GUIL 134
1640 18:19:00 18:19:00 MAD 135
1640 18:25:00 18:25:00 CLIN 136
1640 18:31:00 18:31:00 WES 137
1640 18:40:00 18:40:00 OSB 138
1640 19:06:00 19:06:00 NLC 139

@s2t2
Copy link
Author

s2t2 commented Dec 28, 2018

Seeing this issue still.

trip_id arrival_time departure_time stop_id stop_sequence
1668b3 20:10:00 20:10:00 ST 269
1668b3 20:15:00 20:15:00 NHV 270
1668b3 20:20:00 20:20:00 ST 271
1668b3 21:00:00 21:00:00 WES 272
1668b3 21:05:00 21:05:00 OSB 273

Expecting non-duplication of "ST" in this sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment