Skip to content

Instantly share code, notes, and snippets.

@altilunium
Created September 13, 2023 10:37
Show Gist options
  • Save altilunium/ea55df0ad59ed151f44a32188a7bc915 to your computer and use it in GitHub Desktop.
Save altilunium/ea55df0ad59ed151f44a32188a7bc915 to your computer and use it in GitHub Desktop.
1
00:00:00,000 --> 00:00:03,740
Welcome to another episode of the Mapscaping Podcast.
2
00:00:03,740 --> 00:00:07,960
My name is Daniel and this is a podcast for the Geospatial Community.
3
00:00:07,960 --> 00:00:10,480
My guest on the show today is Jennings Anderson.
4
00:00:10,480 --> 00:00:14,960
So Jennings has been on the podcast before, but today we're talking about something called
5
00:00:14,960 --> 00:00:16,800
the Overature Maps Foundation.
6
00:00:16,800 --> 00:00:22,120
And during this episode you will discover that this is a downstream distribution of
7
00:00:22,120 --> 00:00:23,120
over-street map.
8
00:00:23,120 --> 00:00:25,720
So if you're a little bit confused, that's okay.
9
00:00:25,720 --> 00:00:28,200
We're going to walk you through this along the way.
10
00:00:28,200 --> 00:00:30,800
Also have you understand what the daylight distribution is.
11
00:00:30,800 --> 00:00:35,200
A lot to look forward to in this episode, stay tuned and I'll see you again.
12
00:00:35,200 --> 00:00:38,640
Hi Jennings, welcome back to the podcast.
13
00:00:38,640 --> 00:00:42,040
You've been here before, I will link that episode up in the show notes of this episode to
14
00:00:42,040 --> 00:00:43,440
people can check that out.
15
00:00:43,440 --> 00:00:46,360
It was about over-street map, just in case people interested.
16
00:00:46,360 --> 00:00:50,680
But today on the podcast we're going to be talking about the Overature Maps Foundation
17
00:00:50,680 --> 00:00:54,760
which is a downstream product of OpenStreetMap.
18
00:00:54,760 --> 00:00:56,160
Maybe we could start with an introduction.
19
00:00:56,160 --> 00:01:00,720
Would you mind just introducing yourself to the audience please, perhaps give us an understanding
20
00:01:00,720 --> 00:01:04,720
of your title of your responsibilities, where you work, that kind of thing, and then we'll
21
00:01:04,720 --> 00:01:06,640
hit off and talk about over to you.
22
00:01:06,640 --> 00:01:07,640
Sure, sure.
23
00:01:07,640 --> 00:01:08,640
Thank you.
24
00:01:08,640 --> 00:01:09,880
My name is Jennings Anderson.
25
00:01:09,880 --> 00:01:13,080
I'm currently a research scientist at NETA.
26
00:01:13,080 --> 00:01:18,160
I've been a researcher in the OpenMap data world for now about a decade, which feels
27
00:01:18,160 --> 00:01:19,160
wild to say.
28
00:01:19,160 --> 00:01:25,039
I had started researching OSM, OpenStreetMap just after Typhoon, Yolanda in the Philippines
29
00:01:25,040 --> 00:01:26,680
in 2013.
30
00:01:26,680 --> 00:01:31,040
At that point we were looking to show how the creation of the tasking manager from the
31
00:01:31,040 --> 00:01:35,040
humanitarian openStreetMap team changed the interaction patterns between all these
32
00:01:35,040 --> 00:01:40,760
mappers and coming together to produce OpenGO spatial data in the aftermath of the disaster.
33
00:01:40,760 --> 00:01:47,040
So, fast forward, a number of years, I eventually finished a PhD on this topic of studying
34
00:01:47,040 --> 00:01:50,800
openStreetMap and how people collaborate in openStreetMap.
35
00:01:50,800 --> 00:01:56,679
And have gone on to continue to collaborate with researchers in the open data space, which
36
00:01:56,679 --> 00:02:04,440
has brought me to META, where I'm continuing to work on OpenStreetMap data analysis and
37
00:02:04,440 --> 00:02:10,600
how we consume openStreetMap and clean the data, make the data like map distribution, and
38
00:02:10,600 --> 00:02:16,800
eventually use OpenStreetMap as a major source of geospatial data in all the maps across
39
00:02:16,800 --> 00:02:17,800
Meta.
40
00:02:17,800 --> 00:02:21,760
Wow, you can cover a lot of ground here, and please do mention the daylight distribution
41
00:02:21,760 --> 00:02:23,240
I really want to touch on that later on.
42
00:02:23,240 --> 00:02:28,280
I think that's going to be an important piece in understanding the story of the Overtro
43
00:02:28,280 --> 00:02:29,280
maps foundation.
44
00:02:29,280 --> 00:02:35,960
The first a PhD in openStreetMap's, what do people say when you say that's what I'm doing?
45
00:02:35,960 --> 00:02:38,200
My PhD is focused on openStreetMap.
46
00:02:38,200 --> 00:02:43,400
It's always fun because you first have to explain openStreetMap to folks and then explain
47
00:02:43,400 --> 00:02:48,520
and the first response often is, isn't the world mapped, and so you get to get into that
48
00:02:48,520 --> 00:02:54,320
question and describe the importance of a project like OSM, and then talk about all
49
00:02:54,320 --> 00:02:57,320
the fun dynamics of the project and how it's grown.
50
00:02:57,320 --> 00:03:00,480
I mean, the last 10 years have been pretty incredible.
51
00:03:00,480 --> 00:03:04,280
When I started looking at it, it was half a million.
52
00:03:04,280 --> 00:03:09,960
I think registered users for something, and now registered user count is many millions.
53
00:03:09,960 --> 00:03:14,560
I believe two million contributors have actually edited them to help.
54
00:03:14,560 --> 00:03:19,520
Watching these numbers grow in the past decade has been pretty incredible.
55
00:03:19,520 --> 00:03:25,960
It was very fortunate to be able to study this and eventually finish grad school looking
56
00:03:25,960 --> 00:03:28,880
at the evolution of OSM.
57
00:03:28,880 --> 00:03:30,800
So happy to be here today.
58
00:03:30,800 --> 00:03:34,520
Right at the start, we talked about obviously about openStreetMap.
59
00:03:34,520 --> 00:03:39,000
I really hope that we don't need to dive into the details of what that is for the people
60
00:03:39,000 --> 00:03:42,680
listening to this particular podcast, but you mentioned something else as well.
61
00:03:42,680 --> 00:03:44,920
You mentioned the daylight distribution.
62
00:03:44,920 --> 00:03:48,160
And I said, I think this is going to be an important piece in understanding over-true.
63
00:03:48,160 --> 00:03:50,200
So maybe we should start with daylight.
64
00:03:50,200 --> 00:03:56,120
Again, my understanding is that this is a downstream product of openStreetMap, but perhaps
65
00:03:56,120 --> 00:03:58,880
you could put some more words around that, of course, please.
66
00:03:58,880 --> 00:03:59,880
Yes.
67
00:03:59,880 --> 00:04:07,680
So the daylight map distribution is an open data product that, than Facebook, started producing
68
00:04:07,680 --> 00:04:09,360
in 2020.
69
00:04:09,360 --> 00:04:14,240
And so the gist of the daylight map distribution is exactly that.
70
00:04:14,240 --> 00:04:20,160
It's this downstream distribution that that terminology was chosen very specifically to
71
00:04:20,160 --> 00:04:22,800
consider think about like Linux distributions.
72
00:04:22,800 --> 00:04:29,080
And that it's not a copy, it's not a fork, it's a distribution of the openStreetMap database
73
00:04:29,080 --> 00:04:32,760
that has undergone a series of quality controls.
74
00:04:32,760 --> 00:04:39,760
And so kind of at a high level, what happens is each month, Neta, takes a snapshot of the
75
00:04:39,760 --> 00:04:46,800
openStreetMap planet and then runs it through a number of checks for coastline integrity,
76
00:04:46,800 --> 00:04:52,920
broken relations, fandalism, any of these potential issues on the map, and then spends
77
00:04:52,920 --> 00:04:58,159
the next four weeks addressing each of these issues and fixing them.
78
00:04:58,160 --> 00:05:04,320
And importantly, all of those fixes are made upstream, those fixes are made not on the
79
00:05:04,320 --> 00:05:07,600
daylight distribution, but rather in openStreetMap.
80
00:05:07,600 --> 00:05:11,760
And many times when you go to identify the error and go look it up, the community has
81
00:05:11,760 --> 00:05:14,720
actually already, has already fixed those errors.
82
00:05:14,720 --> 00:05:18,960
So it's either the data team that meta, that's looking at those errors or the community
83
00:05:18,960 --> 00:05:21,920
has already fixed them just because somebody else found them.
84
00:05:21,920 --> 00:05:29,320
And then those fixes are reinjusted and so at the end of a month, you end up with most
85
00:05:29,320 --> 00:05:35,080
of the data being about one month old, but anything that was identified as needing to
86
00:05:35,080 --> 00:05:41,160
be addressed has been updated to kind of the latest fixed or clean version.
87
00:05:41,160 --> 00:05:47,720
And so that is then released at the end of that month period as the current daylight map
88
00:05:47,720 --> 00:05:49,000
distribution.
89
00:05:49,000 --> 00:05:54,000
And so right now, the latest version that just came out was version 1.29.
90
00:05:54,000 --> 00:06:00,000
A couple other changes are kind of made in the data to make it a little bit more user friendly.
91
00:06:00,000 --> 00:06:05,080
Be normalized all the heights, for example, that's something that is important in openStreetMap.
92
00:06:05,080 --> 00:06:10,000
You can have many different accepted values for the height tag for say a building.
93
00:06:10,000 --> 00:06:12,760
You can have it in inches and feet and meters.
94
00:06:12,760 --> 00:06:17,760
And so we'll go ahead and just normalize all that and make that clean into just meters, for
95
00:06:17,760 --> 00:06:18,760
example.
96
00:06:18,760 --> 00:06:22,880
So a lot of these just little fixes and then at the end of the day, you have this kind
97
00:06:22,880 --> 00:06:30,680
of view of OSM that can be, you know, is kind of an enterprise ready version of OSM where
98
00:06:30,680 --> 00:06:36,200
you have this quality guarantee that it has been looked over by a data quality assurance
99
00:06:36,200 --> 00:06:37,200
team.
100
00:06:37,200 --> 00:06:43,000
And that is what, you know, we end up serving in our maps at meta as well as a number of other
101
00:06:43,000 --> 00:06:44,719
companies will ingest.
102
00:06:44,720 --> 00:06:49,680
And any importantly, you know, the data that map distribution is open and free for anybody
103
00:06:49,680 --> 00:06:56,600
could to go download that and use that as a version of a distribution of openStreetMap data
104
00:06:56,600 --> 00:06:57,600
into their products.
105
00:06:57,600 --> 00:06:59,800
Wow, that there was a fantastic overview.
106
00:06:59,800 --> 00:07:00,800
Thank you very much for that.
107
00:07:00,800 --> 00:07:01,800
Really appreciate it.
108
00:07:01,800 --> 00:07:06,320
So let's jump further now and talk about the over to a maps foundation.
109
00:07:06,320 --> 00:07:12,520
What it is, and then my hope is that we can pros and cons between the daylight distribution
110
00:07:12,520 --> 00:07:18,340
OSM itself and use that as a way of helping people understand what this is, what is the
111
00:07:18,340 --> 00:07:24,240
over to a maps foundation, who is it for, how is it different from these other huge data
112
00:07:24,240 --> 00:07:25,240
sets?
113
00:07:25,240 --> 00:07:26,240
Yes.
114
00:07:26,240 --> 00:07:27,240
Okay.
115
00:07:27,240 --> 00:07:31,320
So the obvious leading question here is, if you had to explain over to a maps foundation
116
00:07:31,320 --> 00:07:33,000
to me, how would you do that?
117
00:07:33,000 --> 00:07:39,919
So the over-term maps foundation is an open data project within the Linux foundation.
118
00:07:39,920 --> 00:07:46,280
The over-term aims to create easy to use and interoperable open map data for developers
119
00:07:46,280 --> 00:07:51,120
who build map services or use geospatial data generally.
120
00:07:51,120 --> 00:07:56,840
So this means bringing together all of the great open geospatial data sources out there
121
00:07:56,840 --> 00:07:59,480
of which openStreetMap is one.
122
00:07:59,480 --> 00:08:03,080
And I think the keyword here is the inner operability of this data.
123
00:08:03,080 --> 00:08:07,920
So finding ways to bring together multiple open data sources such as openStreetMap and
124
00:08:07,920 --> 00:08:13,480
then let's take another data source from one of the steering members of over-term is Microsoft.
125
00:08:13,480 --> 00:08:16,080
They have the Microsoft Building Footprints.
126
00:08:16,080 --> 00:08:20,400
That's the building footprints extracted from imagery all over the world.
127
00:08:20,400 --> 00:08:25,240
I think the data set in total has about 1.4 billion building footprints.
128
00:08:25,240 --> 00:08:30,240
That data can then be combined with openStreetMap building footprints.
129
00:08:30,240 --> 00:08:35,720
And now you end up with potentially the most complete open data distribution of building
130
00:08:35,720 --> 00:08:36,720
footprints.
131
00:08:36,720 --> 00:08:41,520
And so that's one example of two data sets coming together with an over-term apps.
132
00:08:41,520 --> 00:08:47,080
This is a fantastic example because are those same building footprints being slowly
133
00:08:47,080 --> 00:08:49,400
but surely ingested into openStreetMap.
134
00:08:49,400 --> 00:08:56,120
So my question is what problem is over to solving here compared to openStreetMap itself
135
00:08:56,120 --> 00:08:57,720
and the daylight distribution?
136
00:08:57,720 --> 00:09:01,240
How is it different from what's happening over in those other two places?
137
00:09:01,240 --> 00:09:04,480
Yes, that's a great question and fantastic segue.
138
00:09:04,480 --> 00:09:11,120
So those building footprints are also being slowly but surely ingested into openStreetMap
139
00:09:11,120 --> 00:09:17,920
one way that they can be ingested is via the rapid editor which shows you the
140
00:09:17,920 --> 00:09:22,160
outline of the footprint and allows a users to click accept.
141
00:09:22,160 --> 00:09:27,440
And that imports that particular footprint into the openStreetMap database.
142
00:09:27,440 --> 00:09:33,840
Now that's the important piece there is this footprint was the product of machine learning
143
00:09:33,840 --> 00:09:40,800
model and by a user clicking on it and saying accept it's now been human validated.
144
00:09:40,800 --> 00:09:45,600
And now it's been added to openStreetMap because the data on the openStreetMap database
145
00:09:45,600 --> 00:09:52,800
has a specific level of quality which is it has been created or curated by a mapper,
146
00:09:52,800 --> 00:09:53,800
right?
147
00:09:53,800 --> 00:09:59,600
OpenStreetMap is a community project and so by virtue of accepting that building that data
148
00:09:59,600 --> 00:10:04,640
has been validated and now that particular building is part of the openStreetMap data set
149
00:10:04,640 --> 00:10:10,000
as well as still exists in this Microsoft building dataset but we can use that signal to say
150
00:10:10,000 --> 00:10:16,080
actually this is now in OSM this is now in OSM building and we can then take that as it has
151
00:10:16,080 --> 00:10:19,440
been seen by a person and has been kind of validated.
152
00:10:19,440 --> 00:10:26,400
So an important kind of intermediate step there is that alongside the daylight map distribution,
153
00:10:26,400 --> 00:10:33,199
one thing that Microsoft does as a as a daylight partner is they take their building dataset
154
00:10:33,199 --> 00:10:39,280
and they compare it to every version of daylight and they'll go ahead and subtract out any
155
00:10:39,280 --> 00:10:45,920
building that already exists or overlaps of building that's already in OSM and they release that
156
00:10:45,920 --> 00:10:51,439
as what they call the sidecar file. So if you look at what daylight has historically done you
157
00:10:51,439 --> 00:10:56,000
have for each version of daylight you'll have the openStreetMap daylight distribution
158
00:10:56,000 --> 00:11:02,880
and then you'll have the sidecar Microsoft buildings file which as released as an OSM file and when
159
00:11:02,880 --> 00:11:09,040
you apply those two together you get all the buildings together you get this union of buildings
160
00:11:09,040 --> 00:11:15,280
where you're not going to have any Microsoft ML building kind of overlapping or overwriting an
161
00:11:15,280 --> 00:11:20,560
OSM building because we're making that decision that since it's in OSM it's been accepted and has
162
00:11:20,560 --> 00:11:26,719
this kind of human validated quality level to it which we're saying is is always going to be
163
00:11:26,719 --> 00:11:32,319
better than just an unvalidated version of a building from a machine learning model.
164
00:11:32,319 --> 00:11:38,160
Now that's still these two distinct datasets right to use those two in conjunction you have to go
165
00:11:38,160 --> 00:11:42,800
download and import all of the daylight data and then you have to apply all the Microsoft
166
00:11:42,800 --> 00:11:47,520
buildings to it and they still are going to look you know look different.
167
00:11:47,520 --> 00:11:52,800
Once still coming from openStreetMap one still is other data set and so over to it takes that
168
00:11:52,800 --> 00:12:00,560
one step further and says well let's go ahead and acknowledge that we want easier to use
169
00:12:00,560 --> 00:12:06,880
interoperable openMap data and we have these two building datasets we're going to produce
170
00:12:06,880 --> 00:12:13,280
a buildings theme with its own schema and we're going to find a way that we can map both the
171
00:12:13,280 --> 00:12:19,760
OSM data and the Microsoft building data into that and we're going to say that users who are
172
00:12:19,760 --> 00:12:26,000
looking for that combined openMap open building dataset at the end of the day are going to be
173
00:12:26,000 --> 00:12:32,400
able to use the building theme from overchure and have all those buildings together already
174
00:12:32,400 --> 00:12:36,720
conflated and they'll have they'll have the references to the OSM building and the
175
00:12:36,720 --> 00:12:43,280
reference to the Microsoft building all there and one theme called building with one height attribute
176
00:12:43,280 --> 00:12:48,400
that has been normalized and all of those different pieces can kind of come together under that
177
00:12:48,400 --> 00:12:54,240
one theme. So overchure is kind of the ultimate downstream distribution of the dataset we're
178
00:12:54,240 --> 00:12:58,960
bringing together all these different open data sets. Now one thing I didn't mention in there
179
00:12:58,960 --> 00:13:05,680
also included is height estimates from USGS LiDAR data that's been overlaid onto the building
180
00:13:05,680 --> 00:13:11,280
dataset so when there isn't wasn't an existing height in say OSM that LiDAR height has been
181
00:13:11,280 --> 00:13:15,760
has also been added and put in that so all these different ways we can we can start to combine
182
00:13:15,760 --> 00:13:21,120
it can play and create this you know one easy to use I want buildings I'm going to go look at the
183
00:13:21,120 --> 00:13:26,959
buildings theme that is what overture is producing at the end of the day. Wow that's again great
184
00:13:26,959 --> 00:13:32,240
overview thank you very much so let me try and summarize this open street map if we think about
185
00:13:32,240 --> 00:13:37,920
these massive building footprint data datasets that have been generated by by Microsoft and Google
186
00:13:37,920 --> 00:13:42,560
and probably others as well around the place the open data sets they are slowly but surely
187
00:13:42,560 --> 00:13:46,640
making the way into open street map but there's a human in the loop which is great right because
188
00:13:46,640 --> 00:13:52,160
it's human validated like you were saying before but it also means that the process is slow yes so
189
00:13:52,160 --> 00:13:56,720
we're limited by that in terms of speed of entry of data by the human in the loop the daylight
190
00:13:56,720 --> 00:14:02,080
distribution says okay the open street map validated housing layer that gets priority
191
00:14:02,080 --> 00:14:07,440
but it's important to remember that we have these other datasets so in order to try and provide
192
00:14:07,440 --> 00:14:12,320
a complete dataset if we talk about building footprints you can get two different files open street
193
00:14:12,320 --> 00:14:17,760
map version of it and all the buildings that don't exist in open street map is the signcast file
194
00:14:17,760 --> 00:14:24,000
the over to a map's way is hey we're going to combine all these things in one and normalize them
195
00:14:24,000 --> 00:14:28,080
and we're going to add some other things as well so heights for example if they are missing from
196
00:14:28,080 --> 00:14:32,960
the open street map data one thing we haven't talked about here is the schema how does that
197
00:14:32,960 --> 00:14:38,560
change is this also copied from open street map over into these other distributions or
198
00:14:38,560 --> 00:14:43,200
do these other distributions have their own schema yeah so I should also add you mentioned
199
00:14:43,200 --> 00:14:49,440
other data sources there as rebuildings are also included in overture so again another
200
00:14:49,440 --> 00:14:53,840
just one more dataset that's being conflated into those buildings this is data from the
201
00:14:53,840 --> 00:15:00,480
as re community maps data sets so when it comes to schema it does look fundamentally different
202
00:15:00,480 --> 00:15:05,840
from from open street map and this is this is by design right open street map has the flexible
203
00:15:05,840 --> 00:15:11,440
key value tagging model which is critical that's that's exactly as it as it should be it's been
204
00:15:11,440 --> 00:15:17,120
designed for for good reason and allows the maximum out of flexibility there's a process that
205
00:15:17,120 --> 00:15:23,680
anybody looking to build a map with open street map data has to go through which is to map those
206
00:15:23,680 --> 00:15:32,079
key values into something more rigid that they then can can style onto a map and so the overture
207
00:15:32,079 --> 00:15:38,000
map schema does exactly that and says okay this is going to be the rigid style the rigid schema
208
00:15:38,000 --> 00:15:44,640
for how somebody can render a map from this data so we're going to take for example the building
209
00:15:44,640 --> 00:15:51,280
key in open street map can have any number of values from building equals yes to house to residential
210
00:15:51,280 --> 00:15:57,839
to commercial industrial et cetera and in the overture data schema we're going to map that
211
00:15:57,839 --> 00:16:03,280
to a field called class and we're only going to have a handful of values in there such as residential
212
00:16:03,280 --> 00:16:08,560
commercial medical et cetera the kind of just higher level pieces rather than having
213
00:16:09,280 --> 00:16:16,560
garage or a shed or some of these other open ended attacking schemas so that's an example
214
00:16:16,560 --> 00:16:21,920
of how we're going into a more strict schema there same with height right height and open street
215
00:16:21,920 --> 00:16:28,000
map can take any any value but in the overture building schema it has to be a number and it
216
00:16:28,000 --> 00:16:35,119
has to be in meters and that's just letting that downstream consumer have access to all that valuable
217
00:16:35,119 --> 00:16:39,599
hidden information that came from OSM that we've got ahead and normalized and so they don't have to
218
00:16:39,599 --> 00:16:45,280
add that logic into their pipeline and this is what I think he what we mean by this interoperable
219
00:16:45,280 --> 00:16:50,240
open update is finding a way to bring in all these different sources and we'll go ahead and do
220
00:16:50,240 --> 00:16:56,800
those conversions for that consumer one thing I think we probably dumped over here was this idea
221
00:16:56,800 --> 00:17:01,760
that it made a lot of sense what you're saying before when there was one data layer that needed to
222
00:17:01,760 --> 00:17:05,839
be conflated with open street map though there was pretty easy to wrap my mind around which one
223
00:17:05,839 --> 00:17:10,560
has been human she would validate at that gets priority and then if there is nothing in our
224
00:17:10,560 --> 00:17:15,359
district map then then we take the other you know we take the other data source as being the next
225
00:17:15,359 --> 00:17:20,319
bits one or the the best one what do we do when there's three we talked about Microsoft we talked
226
00:17:20,319 --> 00:17:25,119
about Google we talked about isry you know staying with the building footprints example how do you
227
00:17:25,119 --> 00:17:30,720
you find the the best one how do you conflate that great question currently we're looking at a
228
00:17:30,720 --> 00:17:35,600
couple different approaches and again I'll speak specifically to just the buildings the building
229
00:17:35,600 --> 00:17:40,480
theme because that's that's what this is relevant to in the current model in the data release that we just
230
00:17:40,480 --> 00:17:47,840
had we conflated in the following order we took OSM as the best and then the asry community data
231
00:17:47,840 --> 00:17:53,760
sets and then the Microsoft buildings now we're looking at some other open building data sources
232
00:17:54,399 --> 00:18:00,080
Google has just released their open building data set as well and so we're also looking at
233
00:18:00,080 --> 00:18:08,240
how we can start to add that in this can get us into another key feature of overture mapped
234
00:18:08,240 --> 00:18:12,879
which is what we're calling the global entity reference system gurus is the is the acronym
235
00:18:14,399 --> 00:18:20,480
and this allows us to say that okay we have all these different data sets we're going to go ahead
236
00:18:20,480 --> 00:18:26,159
and do this you know spatial conflation is hard we want to make this easier for everyone so we're
237
00:18:26,160 --> 00:18:32,880
going to go ahead and define this this concept of this of an entity for a building for example
238
00:18:32,880 --> 00:18:37,440
and we're going to say okay for every building we're going to give it a unique ID it's going to be a
239
00:18:37,440 --> 00:18:43,680
stable ID this is something that hasn't necessarily been solved in the in the open open spatial
240
00:18:43,920 --> 00:18:48,480
data world so we're going to we're going to try to create this registry of entities this open
241
00:18:48,480 --> 00:18:56,560
registry that anybody can can use and this will allow us to say that this building with ID 1 2 3 is
242
00:18:57,120 --> 00:19:00,480
this particular building in open street map it's this particular building in the Microsoft
243
00:19:00,480 --> 00:19:05,280
data set it's this particular building in the Google data set and then we can do
244
00:19:05,920 --> 00:19:12,960
conflation based on based on this gurus ID as opposed trying to do a complex spatial conflation
245
00:19:12,960 --> 00:19:17,360
and this will allow us to make decisions like the Google buildings for example are released
246
00:19:17,360 --> 00:19:22,399
with a confidence value from their machine learning model or you can look at other attributes
247
00:19:22,399 --> 00:19:29,840
if you can combine features by this unique ID you can say oh I want the footprint from open street
248
00:19:29,840 --> 00:19:35,679
map but I want to pull in these attributes that as we notice about for these buildings and you can
249
00:19:35,679 --> 00:19:42,719
start to build up these different really rich features by comparing all of this spatial data across
250
00:19:42,720 --> 00:19:50,400
these open data sets with this gurus ID wow that is a really really big idea yeah it'll be pretty
251
00:19:50,400 --> 00:19:55,280
amazing if you manage to like I'm sure you're going to do it right but it doesn't sound easy I'm
252
00:19:55,280 --> 00:19:59,360
thinking about but what if there's an offset like how do you identify the same building and
253
00:19:59,360 --> 00:20:04,800
lots of different lots of different data sets a tree hanging over a building and that was
254
00:20:04,800 --> 00:20:10,320
imaged when you know with leaf on for example you might make that footprint look quite different
255
00:20:10,320 --> 00:20:14,879
to the same process being run when it was when the leaves were off the trees or the tree was
256
00:20:14,879 --> 00:20:20,159
cut down and that happened in a later dataset or something like that or I'm thinking about so
257
00:20:20,159 --> 00:20:26,480
I'm sitting in a room now in my house not attached the house but two and a half meters from the house
258
00:20:26,480 --> 00:20:32,000
is a garage a separate building what if there's entity of the house turns into two separate
259
00:20:32,000 --> 00:20:37,200
features in the future when I imagerying gets better when we have better algorithms that kind of thing
260
00:20:37,200 --> 00:20:41,760
how do we deal with that in terms of like what happens with that ID does it get split up is it
261
00:20:41,760 --> 00:20:47,520
become a apparent child relationship it doesn't sound like an easy problem to solve exactly it's
262
00:20:47,520 --> 00:20:52,320
not going to be an easy problem to solve but that's that's going to be an important problem to solve
263
00:20:52,320 --> 00:20:58,800
everything within gurus has to be stable and when it can't be stable meaning that over different
264
00:20:58,800 --> 00:21:05,200
releases the ID needs to stay the same for the same entity in the case where new data comes in that
265
00:21:05,200 --> 00:21:10,240
says oh no this is the house and this is a garage but previously the registry that the
266
00:21:10,240 --> 00:21:16,720
gurus registry only had one building there then we need to mark that next you know that next version
267
00:21:16,720 --> 00:21:24,240
of the registry needs to say actually this one ID turned into two IDs and that's something that
268
00:21:24,240 --> 00:21:29,920
we'll need to be able to model so that it's backwards compatible when someone goes and looks up and
269
00:21:29,920 --> 00:21:36,640
says I'm looking for this ID the power of gurus initially is going to be this like inflation and how
270
00:21:36,640 --> 00:21:42,880
people can get the data together now when the data's released the real power of gurus is that
271
00:21:42,880 --> 00:21:51,760
anybody can take their data set and match it to the overture data set via gurus so anybody can
272
00:21:51,760 --> 00:21:58,560
release a proprietary geospatial data set with gurus IDs so that anybody else wanting to consume
273
00:21:58,560 --> 00:22:06,080
that data can now instantly do this ID based inflation and you can have you can use this this open
274
00:22:06,080 --> 00:22:13,280
registry as a inflation tool for any third party data stream to match to it and so that's where
275
00:22:13,280 --> 00:22:19,919
gurus will look up and say oh I had this data set it needed to match to this gurus ID this gurus
276
00:22:19,919 --> 00:22:24,879
ID turned into these two buildings and that needs to be able to be resolved so these are some of the
277
00:22:24,880 --> 00:22:30,880
finer nuances of what it's going to take for a stable ID system to really be successful but we are
278
00:22:30,880 --> 00:22:34,800
we are really excited about what this is going to what this is going to look like and what this
279
00:22:34,800 --> 00:22:41,360
is going to enable. Yeah absolutely. Imagine being able to roll back in time you know and see
280
00:22:41,360 --> 00:22:46,800
how a feature is changed. My guess is you'll see these things consolidate over time so
281
00:22:46,800 --> 00:22:52,160
four or five different polygons maybe or slowly but surely merge into one more more accurate
282
00:22:52,160 --> 00:22:57,200
one because you would hope that we would all be sort of matching towards the truth that all these
283
00:22:57,200 --> 00:23:02,240
algorithms start to agree or data sets that degree that are actually the house the building
284
00:23:02,240 --> 00:23:08,560
looks like this. I think that would be fascinating. Agreed I look forward to when we can get to that
285
00:23:08,560 --> 00:23:13,440
get to that point I think that data is moving in that in that direction right it's only going to
286
00:23:13,440 --> 00:23:19,360
improve as better imagery comes out as best better models come out you know gurus will need to keep
287
00:23:19,360 --> 00:23:25,439
evolving to capture that and to be something accurate that really makes it valuable for
288
00:23:25,439 --> 00:23:32,800
external data sources to register to this kind of common registry of entities around the world.
289
00:23:33,520 --> 00:23:38,560
Okay well you've got your work cut out for you but you're starting off with a fantastic acronym.
290
00:23:40,159 --> 00:23:44,639
You're after a good start. So up until now we've been talking about building footprints and I
291
00:23:44,639 --> 00:23:49,120
think that's been a great example to sort of follow through and think about conflation think about
292
00:23:49,120 --> 00:23:53,919
these different data sets that are being produced how we're going to get them combined into one
293
00:23:53,919 --> 00:24:00,399
we talked about the gurus ID talked about how with building footprints as an example we could add
294
00:24:00,399 --> 00:24:07,280
elevation I mean it's interesting stuff I guess the next obvious question is what are the data sets
295
00:24:07,280 --> 00:24:14,000
or data layers data themes are currently available in an over to a map's. Yeah so there's four
296
00:24:14,000 --> 00:24:20,320
data themes currently available in the over to data set that was that was just released. So
297
00:24:20,320 --> 00:24:27,440
we talked about buildings and that is one data set that is using a 500 something million buildings
298
00:24:27,440 --> 00:24:33,840
from open street map another data set that is also a downstream derivative of a OSM is the
299
00:24:33,840 --> 00:24:40,400
transportation theme and I think this is this is another great example of of being a downstream
300
00:24:40,400 --> 00:24:48,960
slightly augmented version of OSM that is hopefully going to be very usable and again interoperable
301
00:24:48,960 --> 00:24:55,120
with other data systems so this is something that Tom Tom is also a member of over-trum apps
302
00:24:55,120 --> 00:24:58,960
and this is something that they've been working on is to take the transportation network
303
00:24:58,960 --> 00:25:04,800
from open street map the road network the all the nodes and ways and the process it and turn it
304
00:25:04,800 --> 00:25:12,159
into segments and connectors right so it's very similar to nodes and ways but what's happening here
305
00:25:12,159 --> 00:25:18,720
is that in open street map you might have a single road that is made up of multiple ways just
306
00:25:18,720 --> 00:25:25,040
because of how it was mapped but it really is just one road and having those two ways kind of
307
00:25:25,040 --> 00:25:32,639
connected end to end doesn't offer any improvements or any value to a routing network likewise
308
00:25:32,640 --> 00:25:39,360
when you join two roads together like at an intersection in OSM you don't necessarily there's no
309
00:25:39,360 --> 00:25:45,600
requirement you can have one road kind of join in the middle of another road and that node that
310
00:25:45,600 --> 00:25:50,960
they both share where they connect isn't necessarily elevated in anything special you need to
311
00:25:51,520 --> 00:25:56,240
you need to do a more of a complex look up to say that oh this is actually intersection
312
00:25:56,240 --> 00:26:01,520
node because it exists in both of these in both of these ways but that way that was kind of
313
00:26:01,520 --> 00:26:06,639
intersect it doesn't get split into multiple ways so if that makes sense so so what happens
314
00:26:06,639 --> 00:26:13,280
downstream when we create this transportation theme is all of those pieces are resolved those ways
315
00:26:13,280 --> 00:26:18,560
that don't necessarily need to be two ways because having them split doesn't offer any benefit
316
00:26:18,560 --> 00:26:24,160
become one segment and then anywhere where you do have data we do have two roads coming
317
00:26:24,160 --> 00:26:29,360
coming together where they really should split and create two segments without having to do that
318
00:26:29,360 --> 00:26:34,879
kind of complex traversal to recreate that network then the data will go ahead and split and
319
00:26:34,879 --> 00:26:40,639
create two segments and one connector and so at the end of it you look at something it's going to look
320
00:26:40,639 --> 00:26:46,560
exactly like the open street map transportation highway network but it's just going to have
321
00:26:46,560 --> 00:26:51,439
kind of slightly under the hood it's going to be slightly different in terms of which are the
322
00:26:51,439 --> 00:26:55,760
connectors and which are the segments and it's going to be optimized for ingestion into
323
00:26:55,760 --> 00:27:02,480
into routing systems so that's another another theme as well as then on top of that that's just
324
00:27:02,480 --> 00:27:06,640
the structure of the segments and the connectors and then there's a number of other attributes
325
00:27:06,640 --> 00:27:13,120
that hire being modeled differently and enabling linear referencing across the data as well so
326
00:27:13,120 --> 00:27:19,360
it is just that I like to think of it as like one level of downstream processing is being done
327
00:27:19,360 --> 00:27:26,000
within over a share to produce something that is very usable and kind of immediately consumable
328
00:27:26,000 --> 00:27:31,760
and we're trying to solve that step so that consumers don't have to do that themselves.
329
00:27:31,760 --> 00:27:36,959
Is there any sort of conflation going on here are we adding other data sets and because
330
00:27:36,959 --> 00:27:43,280
my understanding is that that meta for example produces a a routing layer that is ingested into
331
00:27:43,280 --> 00:27:49,840
open street map via the rapid editor. I'm not sure what you mean about that. Are we just talking
332
00:27:49,840 --> 00:27:54,560
about the data that's in open street map that you're you know simplifying it in the ways that
333
00:27:54,560 --> 00:27:58,960
you're described making it more easy to consume or are we also talking about in terms of the buildings
334
00:27:58,960 --> 00:28:03,280
for example you're doing normalizing or you're creating a new schema that was going to be
335
00:28:03,280 --> 00:28:08,720
more easily digestible for folks wanting to consume this data but you're also adding in data that
336
00:28:08,720 --> 00:28:14,080
wasn't present in open street map. So my question is are you also doing that in terms of the
337
00:28:14,080 --> 00:28:19,520
the transportation network or is this pure open street map data that you are simplifying and making
338
00:28:19,520 --> 00:28:26,240
more consumable for for uses? Gotcha and in the case currently for the transportation theme
339
00:28:26,240 --> 00:28:32,000
it is the full open street map transportation network currently that that theme is just is
340
00:28:32,000 --> 00:28:39,120
ingesting only OSM data but there is you know hopefully a plan for the future to add more more data
341
00:28:39,120 --> 00:28:44,400
on top of that and do that conflation and build out and build out that road network. Absolutely.
342
00:28:44,400 --> 00:28:50,480
Okay so we've got building footprints we've got transportation theme what else is going on in there.
343
00:28:50,480 --> 00:28:55,440
Yeah so those I wanted to cover those too because initially because those are the data sets that
344
00:28:55,440 --> 00:29:03,760
involve open street map data the next one that's exciting is a theme called places and this is a
345
00:29:03,760 --> 00:29:11,040
combination of public place information from meta and Microsoft and it turns into it's the data
346
00:29:11,040 --> 00:29:18,400
says about 60 million points globally of points of interest and places that has never been released
347
00:29:18,400 --> 00:29:25,280
as as open data before. So this is quite exciting for as in terms of an open data offering that
348
00:29:25,280 --> 00:29:32,320
it is 60 million new places all over the world. Wow yeah that's pretty cool that they are
349
00:29:32,320 --> 00:29:37,920
opening up the data like that that's great. So three layers is I don't want to say is that it
350
00:29:37,920 --> 00:29:42,160
because it sounds like an incredible amount of work but is it's that we're at today?
351
00:29:42,160 --> 00:29:48,160
There is one more there is the Appmines layer which is administrative boundaries and that is
352
00:29:48,160 --> 00:29:54,879
data currently from from Microsoft and Tom Tom have been working together on that layer
353
00:29:54,880 --> 00:30:01,760
and that currently includes admin levels for countries and states and hopefully will
354
00:30:01,760 --> 00:30:06,640
continue to grow. We should add that you know this was our initial data release just for these
355
00:30:06,640 --> 00:30:12,560
four themes and we put this out because earlier we had really released the schema and I think
356
00:30:12,560 --> 00:30:18,240
that was exciting and then we needed to get some some data out there for people to see and
357
00:30:18,240 --> 00:30:23,840
play with and that's what we have initially released and so these these themes will continue to
358
00:30:23,840 --> 00:30:29,760
grow and improve as future data releases happen. I've seen a little bit of chatter about this
359
00:30:29,760 --> 00:30:35,199
on Twitter or X but whatever people are calling it these days and some people are really excited
360
00:30:35,199 --> 00:30:40,159
about the over to a maps foundation. Great yeah more open data more better kind of thing
361
00:30:40,159 --> 00:30:45,919
it I completely get that but we've been talking about this idea of accessibility consumer ability
362
00:30:45,919 --> 00:30:49,919
access that kind of thing and I've heard a lot of people say you have to be a real data nerd
363
00:30:49,920 --> 00:30:55,360
to actually get your hands on the stuff that it's actually difficult to access but my
364
00:30:55,360 --> 00:31:01,120
my guess is here that there's probably a difference between being a map data provider and a
365
00:31:01,120 --> 00:31:08,640
map services provider and I'm wondering which one of these roles over to is trying to fill.
366
00:31:08,640 --> 00:31:17,200
Yeah that's a fantastic point so overchurch is a map data provider looking to create you know
367
00:31:17,200 --> 00:31:26,320
this interoperable open map data set and then that data is consumable by developers who want
368
00:31:26,320 --> 00:31:33,920
to build map services and this is I think a key a key distinction here where there's a lot of
369
00:31:33,920 --> 00:31:41,280
opinionated decisions that go into how you take data from from your raw data set into producing
370
00:31:41,280 --> 00:31:46,080
you know titles for example what data gets included what data gets excluded and what it's
371
00:31:46,080 --> 00:31:52,399
going to look like and overchurch is not trying to get into that space of saying what the map
372
00:31:52,399 --> 00:31:58,159
services should look like overchurch wants to be upstream of that and be providing data for any
373
00:31:58,159 --> 00:32:04,720
provider of map services to to consume and hopefully build better products with and so we've made
374
00:32:04,720 --> 00:32:12,000
the decision to release the data as par k files which is a cloud native format which means that
375
00:32:12,000 --> 00:32:17,280
you know when you go to the overchurch website there isn't a download all button and I think that
376
00:32:17,280 --> 00:32:21,680
people have become very used to seeing this I just want to download the whole data set and then
377
00:32:21,680 --> 00:32:29,840
do something with it well it's a really big data set and that doesn't make a lot of sense and it's
378
00:32:29,840 --> 00:32:34,800
funny because the same thing happens with the open street map plan at file right it is released as one
379
00:32:34,800 --> 00:32:41,200
big data set and it keeps growing and there are tools that you can use to ingest the entire
380
00:32:41,200 --> 00:32:46,800
planet file it's just requires bigger and bigger machines which then get into kind of cloud computing
381
00:32:46,800 --> 00:32:52,640
as well and we did release the very initial data set that was released as I talked about earlier
382
00:32:52,640 --> 00:32:57,040
was this we had you know there's daylight and then you have these sidecar files where you can
383
00:32:57,040 --> 00:33:04,320
add on these buildings and the initial data release from from overchurch what we did was add
384
00:33:04,320 --> 00:33:10,640
all those things together and made one giant pbf and said okay we've done this conflation
385
00:33:10,640 --> 00:33:15,440
of all these buildings and all this stuff put into one giant pbf it was like almost a hundred gigs
386
00:33:15,440 --> 00:33:20,800
and now you can go download it and read this into your systems and have all these buildings
387
00:33:20,800 --> 00:33:25,280
and all the data kind of pre-conflated and we heard back from people that said well now this
388
00:33:25,280 --> 00:33:31,280
files too big it's not actually that that usable anymore because we have to spin up you know
389
00:33:31,280 --> 00:33:37,200
there's much bigger machine in the cloud to try to process this and so we took that that feedback
390
00:33:37,200 --> 00:33:44,240
and with this with there we want to actually release these each of these four themes we've done it
391
00:33:44,240 --> 00:33:50,320
in a way where it's these parquet files that are as you said this cloud native technology that
392
00:33:50,320 --> 00:33:57,120
allows you to point any number of kind of big data systems at these files and poke and prod and inspect
393
00:33:57,120 --> 00:34:05,440
the data and extract the data in the view that makes sense for you right so one example is that
394
00:34:05,440 --> 00:34:12,560
these are just data sets sitting out on on S3 and on Azure blob storage as well so you can pick
395
00:34:12,560 --> 00:34:18,639
your kind of provider or you can point open source tools such as duct db you can point them right
396
00:34:18,639 --> 00:34:24,240
to the the data set and you can run it locally but it allows you to write a query that says I want
397
00:34:24,240 --> 00:34:31,120
to download the data in this bounding box I want these columns and I want to convert this string
398
00:34:31,120 --> 00:34:36,799
into into something else I want to it allows you to have just a lot more control over the data that
399
00:34:36,799 --> 00:34:43,440
you're actually getting that then you can feed into your into your services what was really exciting
400
00:34:43,440 --> 00:34:46,639
when this came out is you know that's that's fundamentally different than the way that data's
401
00:34:46,639 --> 00:34:50,400
been released in the past where people are are more used to pushing this download but and then
402
00:34:50,400 --> 00:34:55,679
so certainly like there was some some comments about that but pretty quickly and we're trying to
403
00:34:55,679 --> 00:35:00,799
put some examples out of how to get started with this or that and pretty quickly there was a number
404
00:35:00,800 --> 00:35:06,720
of blogs that were coming out of people saying you know okay initially this was strange but then
405
00:35:06,720 --> 00:35:10,960
I went and did this and here's how I did it and I was able to look at the data and get the data
406
00:35:10,960 --> 00:35:14,720
and do this with it and so there were there were blog posts that were coming out showing
407
00:35:14,720 --> 00:35:19,920
with your Amazon account you can then query the data download the CSV and load it right into
408
00:35:19,920 --> 00:35:25,280
QGIS that was really exciting to see I think that we have made the right decision in terms of
409
00:35:25,280 --> 00:35:30,000
releasing it in this cloud native format that doesn't make the sub-pinitated decisions that says this is
410
00:35:30,000 --> 00:35:35,520
what the data has to look like we are trying to offer it at kind of one level above that and say
411
00:35:35,520 --> 00:35:41,280
this is the schema we've done this this enlist to bring the data together and make it
412
00:35:41,280 --> 00:35:47,680
interoperable but now it's up to you to decide what pieces of the data you want to use and how
413
00:35:47,680 --> 00:35:53,840
you want to use them and for developers who have access to their own cloud infrastructure and they just
414
00:35:53,840 --> 00:35:59,120
want to go download all the parka files and load them locally and do their own thing with that
415
00:35:59,120 --> 00:36:04,720
absolutely that is still certainly an option so I think that we've tried to offer the most
416
00:36:04,720 --> 00:36:11,200
flexibility but certainly have heard a lot of feedback and folks that are looking for that download
417
00:36:11,200 --> 00:36:16,480
button but I think that we can I hope that we can put more examples together as well to show
418
00:36:16,480 --> 00:36:21,920
to show how well there isn't a download button you can actually do a lot more with the data
419
00:36:21,920 --> 00:36:26,720
and how it's being released now that makes a lot of sense also I think like choosing those
420
00:36:26,720 --> 00:36:31,600
cloud native for that cloud native format that you're talking about that must rightly
421
00:36:31,600 --> 00:36:37,919
simplify the infrastructure you need to maintain the data set as well. My guess is there's no
422
00:36:37,919 --> 00:36:42,640
service running in the background it's blob storage just sitting there waiting so your job is
423
00:36:42,640 --> 00:36:48,560
then to keep updating that their blob storage with with new files. Also after hearing you say that
424
00:36:48,560 --> 00:36:54,720
so you're clearly on the map data provider side not the map services provider side and it sounds
425
00:36:54,720 --> 00:36:59,839
like your customers or the people that you're building this for on the map services side so you
426
00:36:59,839 --> 00:37:04,319
are essentially building this for the geeks for the developers you know for the people that
427
00:37:04,319 --> 00:37:09,759
they want to build on top of what you're providing precisely well we've covered a lot of
428
00:37:09,759 --> 00:37:16,319
ground so I guess again continuing with this theme of obvious questions the mix but it's like
429
00:37:16,319 --> 00:37:20,799
what's so what's next though we've got these four layers I don't mean to trivialize that it sounds
430
00:37:20,800 --> 00:37:25,600
like it's been a lot of work just getting this far and it sounds like you've got a lot of work to
431
00:37:25,600 --> 00:37:30,000
do just with these four layers but my guess is also that you've got plans for this right you're
432
00:37:30,000 --> 00:37:35,680
going to grow this and it's going to develop over time so again without like trivializing the
433
00:37:35,680 --> 00:37:41,680
work you've already done what is next where we're going from here. Yeah so the big thing we have
434
00:37:41,680 --> 00:37:46,640
released is four layers and I talked about this global entity reference system and how great
435
00:37:46,640 --> 00:37:52,240
it's going to be so the next thing that we need to do is is actually implemented on these four
436
00:37:52,240 --> 00:37:58,400
layers so that you know in the coming data releases there will actually be a occurs ID that will be
437
00:37:58,400 --> 00:38:06,160
this stable ID for all of these roads and buildings and places and admin features that
438
00:38:06,160 --> 00:38:12,480
that we're putting out in these themes and in terms of the next data layers I think that
439
00:38:12,480 --> 00:38:18,240
that's going to be a decision that's you know going to be driven by what what people are after
440
00:38:18,240 --> 00:38:24,320
what consumers are looking for and what companies and what data sets join over cheer and
441
00:38:24,320 --> 00:38:30,160
become interested in the project and want to participate and want to bring their open data to the
442
00:38:30,160 --> 00:38:36,080
to the table and and add it to these to the system we also do want to make sure that
443
00:38:36,640 --> 00:38:42,160
one theme that we're looking at next is what is it take to make to actually make a map from the
444
00:38:42,160 --> 00:38:47,520
overtry data I think that's a big question you know if you want to build a base map from
445
00:38:47,520 --> 00:38:55,040
from overtry data right now you have there's a transportation layer, admins and buildings and places
446
00:38:55,040 --> 00:39:00,319
so you're off to a great start but you also need to know where the land is where the water is
447
00:39:01,120 --> 00:39:08,240
other types of land use and so we're looking to put together kind of a context theme that's going
448
00:39:08,240 --> 00:39:12,879
to help us create something that you can actually make a full base map out of so that's something
449
00:39:12,879 --> 00:39:17,919
we're also looking back at open street map is a great source of data for that from the
450
00:39:17,919 --> 00:39:23,359
the daylight distribution and so how can we bring some more of that data in in these separate
451
00:39:23,359 --> 00:39:29,759
discrete layers so that give consumers the option to to turn those those data themes on as well
452
00:39:29,759 --> 00:39:35,040
those data layers on so those are probably the two two big things that are immediately
453
00:39:35,040 --> 00:39:39,920
honored to do west wow also big things it's going to be really interesting the follow this project
454
00:39:39,920 --> 00:39:47,040
and and see where you we end up especially also watching to see who joins the foundation right
455
00:39:47,040 --> 00:39:52,000
to and see what they bring to it that's going to be a pretty interesting yeah just pretty
456
00:39:52,000 --> 00:39:57,840
interesting to follow along so our open street map that has a really passionate community behind it
457
00:39:57,840 --> 00:40:03,040
around it it's been going for a while now and it's growing like you mentioned at the start of the
458
00:40:03,040 --> 00:40:10,080
episode it's been any pushback of people worried that you are trying to steal open street maps thunder
459
00:40:10,080 --> 00:40:17,120
that you're going to somehow make them be irrelevant or that you're stealing the data anything like
460
00:40:17,120 --> 00:40:22,400
that I'm not doing a great job of formulating this question but but I hope you know where I'm going
461
00:40:22,400 --> 00:40:28,800
no I I totally understand I think that this is an important and important distinction to to get right
462
00:40:28,800 --> 00:40:36,320
and over trip maps is consuming open street map data and I think that the two together are very
463
00:40:36,320 --> 00:40:44,640
complementary in that open street map continues to be the source for community maintained data
464
00:40:44,640 --> 00:40:50,800
and a vibrant evolving community maintaining open map data over trip is as you said earlier you
465
00:40:50,800 --> 00:40:56,960
know downstream of open street map and continuing to consume that and so when there's open street
466
00:40:56,960 --> 00:41:03,360
map data inside of over trip if that data needs to be adjusted all the data in in over trip
467
00:41:03,360 --> 00:41:08,000
is coming from from somewhere and it's not it's not going to be fixed inside of over trip it needs
468
00:41:08,000 --> 00:41:14,800
to go be fixed or updated or augmented at the source so this is something where if there's
469
00:41:14,800 --> 00:41:18,720
something in over trip that's coming from open street map there's still this this got to be
470
00:41:18,720 --> 00:41:25,280
the feedback loop that goes all the way back to to OSM to OSM remains like the original data
471
00:41:25,280 --> 00:41:30,320
source the original community and I think we can think of over trip as this downstream part of it
472
00:41:30,320 --> 00:41:37,680
and as a result companies involved in over trip such as neta will still very much stay involved
473
00:41:37,680 --> 00:41:43,200
in in open street map and one example is you know supporting the rapid editor for open street
474
00:41:43,200 --> 00:41:50,240
map data in order to keep the high quality data coming into over trip from from open street map
475
00:41:50,240 --> 00:41:56,160
all of that data validation data editing that still is happening in open street map it's not
476
00:41:56,160 --> 00:42:01,279
happening somewhere else and so there's still to that degree I don't think it's stealing open street
477
00:42:01,279 --> 00:42:06,399
maps thunder in any way but rather maintaining a presence within the open street map community
478
00:42:06,399 --> 00:42:12,000
and and helping ensure that that open street map continues to to do what open street map does
479
00:42:12,000 --> 00:42:19,279
which is being open vibrant community supporting high quality geospatial open geospatial data
480
00:42:19,280 --> 00:42:26,320
at the same time over trip provides this kind of place where other data sets can get merged into
481
00:42:26,320 --> 00:42:32,320
the over-term apps data set such as these you know AI derived buildings or roads where
482
00:42:32,960 --> 00:42:39,600
maybe that's not where they belong is in in open street map right the data that historically
483
00:42:39,600 --> 00:42:44,640
has maybe like there's there are a lot of questions around whether AI data should be
484
00:42:44,640 --> 00:42:49,359
important into OSM and by a rapid we're saying we're not importing it but you're actually
485
00:42:49,359 --> 00:42:53,440
looking at it and doing this human and the blue validation of the data and then it becomes part
486
00:42:53,440 --> 00:42:57,839
of open street map because it has gone through that human and the blue validation but if somebody
487
00:42:57,839 --> 00:43:03,759
wants to add those two data sets together over-term is the place they can go to get that full
488
00:43:03,759 --> 00:43:09,279
kind of complete data set yeah like it makes a lot of sense if the start like when we first
489
00:43:09,280 --> 00:43:14,560
started talking about this I was thinking it wasn't clear to me that these two projects weren't
490
00:43:14,560 --> 00:43:19,920
in competition with each other but when you put it like that when you described the the process
491
00:43:19,920 --> 00:43:25,760
of conflating these different data sets especially that these AI generated data sets
492
00:43:25,760 --> 00:43:30,720
I can see where you're going and you can see why people want this I really can I can also
493
00:43:30,720 --> 00:43:36,960
see that if you're simplifying the schema making it easier to search this idea of the global
494
00:43:36,960 --> 00:43:42,400
features ID doesn't sound easy to implement but man if you do that they'll be incredibly powerful
495
00:43:42,400 --> 00:43:48,000
you're adding these extra extra attributes elevation we talked about that before my guess is
496
00:43:48,000 --> 00:43:53,040
still but there'll be other things along the way and giving people a place to put the data which
497
00:43:53,040 --> 00:43:57,280
they can't just go and dump it into the community right and overwrite all the work that's been
498
00:43:57,280 --> 00:44:03,120
done or yeah so I can totally see it from that perspective and again again again I think this
499
00:44:03,120 --> 00:44:07,279
is going to be a super interesting project to follow along with but we've been talking about
500
00:44:07,279 --> 00:44:12,319
this the overture for a while now personally I think you've done a great job of explaining what
501
00:44:12,319 --> 00:44:16,880
it is and walking us through the process how it works in terms of these different data themes
502
00:44:16,880 --> 00:44:21,040
that you've got in there and a little bit about what the future looks like what is the the biggest
503
00:44:21,040 --> 00:44:25,759
misunderstanding about this project I mean you clearly talked to a lot of different people in the
504
00:44:25,759 --> 00:44:31,200
mapping world what what is the bit that they don't immediately get where when you start talking
505
00:44:31,200 --> 00:44:36,560
to them about overture what is the question that a lot of people will ask you about this.
506
00:44:36,560 --> 00:44:42,319
That's a good question I think it was really enlightening to be at state of the map a US for
507
00:44:42,319 --> 00:44:47,040
example in Richmond most recently and talk to people about overture because I think you know there's
508
00:44:47,040 --> 00:44:52,879
a lot of excitement and there's been a lot of I think speculation once we get to talking about it
509
00:44:52,879 --> 00:44:57,759
and explaining I think again explaining that relationship between between open street map and
510
00:44:57,760 --> 00:45:03,040
and overture and showing that you know these these these two aren't existing in competition you're
511
00:45:03,040 --> 00:45:09,040
not going to go to overture maps and click edit and edit this like separate version that's not
512
00:45:09,040 --> 00:45:15,600
what overture maps is overture maps is this is this combined open data set that is as we said
513
00:45:15,600 --> 00:45:20,480
kind of downstream of of of open street map and open street map is going to continue to to remain
514
00:45:20,480 --> 00:45:27,120
the vibrant community supporting open geospatial data community maintained data and so
515
00:45:27,120 --> 00:45:33,600
it's just one of the many inputs into overture and I think that that distinction is very important
516
00:45:33,600 --> 00:45:38,560
to make and I think that as we as overture keeps producing data sets that will become more
517
00:45:38,560 --> 00:45:45,920
more obvious as to what what exactly can be done with overture maps data and and show that it is
518
00:45:46,480 --> 00:45:52,000
another distribution of open update of that includes high quality data from from open street
519
00:45:52,000 --> 00:45:58,160
map and that's really exciting and is hopefully going to give more credibility to to the value of
520
00:45:58,160 --> 00:46:04,400
open street map average of being included in this one you know larger project and combining data
521
00:46:04,400 --> 00:46:09,280
sets from other sources as well into that I think that that's really exciting at the end of
522
00:46:09,280 --> 00:46:16,480
the day this is all happening in the open this is a Linux foundation project and the goal here is to
523
00:46:16,480 --> 00:46:24,000
build an irreverible open map data for anyone building massive map services and needs enterprise
524
00:46:24,000 --> 00:46:28,880
quality map data. What Jennings I think probably this is a great place to round off the conversation
525
00:46:28,880 --> 00:46:33,200
thank you very much for your time thank you for showing up on the podcast again I really appreciate it
526
00:46:33,200 --> 00:46:38,240
always enjoy talking with you you have this sort of enthusiasm you know about these kinds of projects
527
00:46:38,240 --> 00:46:42,560
that I really appreciate so it's been a pleasure we've mentioned the name a bunch of times
528
00:46:42,560 --> 00:46:46,880
overture maps foundation to be linked to that in the show notes is there anywhere else you want to
529
00:46:46,880 --> 00:46:52,320
direct people to send them to if they want to learn more yeah I think that the overture maps
530
00:46:52,320 --> 00:46:59,440
website overturemaps.org has an FAQ and a button to learn more about how to become a member
531
00:46:59,440 --> 00:47:04,799
and how to get involved in shaping you know what the what the future of the foundation looks like
532
00:47:04,799 --> 00:47:09,200
and how you know this is how we're making these decisions around what the schema looks like
533
00:47:09,200 --> 00:47:14,720
is by all these you know the member companies coming together and and developing these schema
534
00:47:14,720 --> 00:47:21,120
and releasing this data so I think you can learn more and and also links to the documentation and
535
00:47:21,120 --> 00:47:26,799
to the schema are all available there on the website well thanks again Jennings I'll include
536
00:47:26,799 --> 00:47:30,799
those links in the show notes and I hope people take the time and check it out really appreciate
537
00:47:30,799 --> 00:47:37,359
it time thanks for showing up fantastic thank you so much so I really hope you enjoyed that episode
538
00:47:37,360 --> 00:47:44,000
with Jennings Anderson research scientist at meta talking about the the overture maps foundation
539
00:47:44,000 --> 00:47:48,640
so I mentioned right the start of this episode that Jennings has been on the podcast before
540
00:47:48,640 --> 00:47:53,680
that episode I believe was called open street map a community of communities it's worth checking
541
00:47:53,680 --> 00:47:57,120
out there'll be a link in the show notes and what's going to put a link in the show notes to
542
00:47:57,120 --> 00:48:02,960
another topic that that Jennings mentioned and that was the rapid editor so again this is the tool
543
00:48:02,960 --> 00:48:09,200
an open source tool developed and maintained by meta which helps people rapidly edit open street map
544
00:48:09,200 --> 00:48:16,160
it's designed as a way of primarily of getting AI generated data sets validated by humans in any
545
00:48:16,160 --> 00:48:21,440
and into open street map anyway it's worth checking up it's also worth mentioning
546
00:48:21,440 --> 00:48:25,920
with published a few episodes now around this idea of cloud native geospatial and this is
547
00:48:25,920 --> 00:48:31,840
important because the overture maps foundation is providing data in this cloud native format
548
00:48:31,840 --> 00:48:38,720
and this particular case it's in as packe files published a few episodes now around this idea of
549
00:48:38,720 --> 00:48:43,520
cloud native geospatial formats and if you're not entirely sure what they are they're going to
550
00:48:43,520 --> 00:48:48,480
become increasingly important going forward in there it's as well worth taking the time to to
551
00:48:48,480 --> 00:48:53,840
understand them so I'll put links to a couple relevant episodes in the show notes of this episode
552
00:48:53,840 --> 00:48:57,760
okay that's it for me that's it for this week's episode thank you very much for tuning in all the
553
00:48:57,760 --> 00:49:02,080
way to be and I'll be back again soon I hope that you take the time to join me then bye
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment