Created
September 13, 2023 10:37
-
-
Save altilunium/ea55df0ad59ed151f44a32188a7bc915 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 | |
00:00:00,000 --> 00:00:03,740 | |
Welcome to another episode of the Mapscaping Podcast. | |
2 | |
00:00:03,740 --> 00:00:07,960 | |
My name is Daniel and this is a podcast for the Geospatial Community. | |
3 | |
00:00:07,960 --> 00:00:10,480 | |
My guest on the show today is Jennings Anderson. | |
4 | |
00:00:10,480 --> 00:00:14,960 | |
So Jennings has been on the podcast before, but today we're talking about something called | |
5 | |
00:00:14,960 --> 00:00:16,800 | |
the Overature Maps Foundation. | |
6 | |
00:00:16,800 --> 00:00:22,120 | |
And during this episode you will discover that this is a downstream distribution of | |
7 | |
00:00:22,120 --> 00:00:23,120 | |
over-street map. | |
8 | |
00:00:23,120 --> 00:00:25,720 | |
So if you're a little bit confused, that's okay. | |
9 | |
00:00:25,720 --> 00:00:28,200 | |
We're going to walk you through this along the way. | |
10 | |
00:00:28,200 --> 00:00:30,800 | |
Also have you understand what the daylight distribution is. | |
11 | |
00:00:30,800 --> 00:00:35,200 | |
A lot to look forward to in this episode, stay tuned and I'll see you again. | |
12 | |
00:00:35,200 --> 00:00:38,640 | |
Hi Jennings, welcome back to the podcast. | |
13 | |
00:00:38,640 --> 00:00:42,040 | |
You've been here before, I will link that episode up in the show notes of this episode to | |
14 | |
00:00:42,040 --> 00:00:43,440 | |
people can check that out. | |
15 | |
00:00:43,440 --> 00:00:46,360 | |
It was about over-street map, just in case people interested. | |
16 | |
00:00:46,360 --> 00:00:50,680 | |
But today on the podcast we're going to be talking about the Overature Maps Foundation | |
17 | |
00:00:50,680 --> 00:00:54,760 | |
which is a downstream product of OpenStreetMap. | |
18 | |
00:00:54,760 --> 00:00:56,160 | |
Maybe we could start with an introduction. | |
19 | |
00:00:56,160 --> 00:01:00,720 | |
Would you mind just introducing yourself to the audience please, perhaps give us an understanding | |
20 | |
00:01:00,720 --> 00:01:04,720 | |
of your title of your responsibilities, where you work, that kind of thing, and then we'll | |
21 | |
00:01:04,720 --> 00:01:06,640 | |
hit off and talk about over to you. | |
22 | |
00:01:06,640 --> 00:01:07,640 | |
Sure, sure. | |
23 | |
00:01:07,640 --> 00:01:08,640 | |
Thank you. | |
24 | |
00:01:08,640 --> 00:01:09,880 | |
My name is Jennings Anderson. | |
25 | |
00:01:09,880 --> 00:01:13,080 | |
I'm currently a research scientist at NETA. | |
26 | |
00:01:13,080 --> 00:01:18,160 | |
I've been a researcher in the OpenMap data world for now about a decade, which feels | |
27 | |
00:01:18,160 --> 00:01:19,160 | |
wild to say. | |
28 | |
00:01:19,160 --> 00:01:25,039 | |
I had started researching OSM, OpenStreetMap just after Typhoon, Yolanda in the Philippines | |
29 | |
00:01:25,040 --> 00:01:26,680 | |
in 2013. | |
30 | |
00:01:26,680 --> 00:01:31,040 | |
At that point we were looking to show how the creation of the tasking manager from the | |
31 | |
00:01:31,040 --> 00:01:35,040 | |
humanitarian openStreetMap team changed the interaction patterns between all these | |
32 | |
00:01:35,040 --> 00:01:40,760 | |
mappers and coming together to produce OpenGO spatial data in the aftermath of the disaster. | |
33 | |
00:01:40,760 --> 00:01:47,040 | |
So, fast forward, a number of years, I eventually finished a PhD on this topic of studying | |
34 | |
00:01:47,040 --> 00:01:50,800 | |
openStreetMap and how people collaborate in openStreetMap. | |
35 | |
00:01:50,800 --> 00:01:56,679 | |
And have gone on to continue to collaborate with researchers in the open data space, which | |
36 | |
00:01:56,679 --> 00:02:04,440 | |
has brought me to META, where I'm continuing to work on OpenStreetMap data analysis and | |
37 | |
00:02:04,440 --> 00:02:10,600 | |
how we consume openStreetMap and clean the data, make the data like map distribution, and | |
38 | |
00:02:10,600 --> 00:02:16,800 | |
eventually use OpenStreetMap as a major source of geospatial data in all the maps across | |
39 | |
00:02:16,800 --> 00:02:17,800 | |
Meta. | |
40 | |
00:02:17,800 --> 00:02:21,760 | |
Wow, you can cover a lot of ground here, and please do mention the daylight distribution | |
41 | |
00:02:21,760 --> 00:02:23,240 | |
I really want to touch on that later on. | |
42 | |
00:02:23,240 --> 00:02:28,280 | |
I think that's going to be an important piece in understanding the story of the Overtro | |
43 | |
00:02:28,280 --> 00:02:29,280 | |
maps foundation. | |
44 | |
00:02:29,280 --> 00:02:35,960 | |
The first a PhD in openStreetMap's, what do people say when you say that's what I'm doing? | |
45 | |
00:02:35,960 --> 00:02:38,200 | |
My PhD is focused on openStreetMap. | |
46 | |
00:02:38,200 --> 00:02:43,400 | |
It's always fun because you first have to explain openStreetMap to folks and then explain | |
47 | |
00:02:43,400 --> 00:02:48,520 | |
and the first response often is, isn't the world mapped, and so you get to get into that | |
48 | |
00:02:48,520 --> 00:02:54,320 | |
question and describe the importance of a project like OSM, and then talk about all | |
49 | |
00:02:54,320 --> 00:02:57,320 | |
the fun dynamics of the project and how it's grown. | |
50 | |
00:02:57,320 --> 00:03:00,480 | |
I mean, the last 10 years have been pretty incredible. | |
51 | |
00:03:00,480 --> 00:03:04,280 | |
When I started looking at it, it was half a million. | |
52 | |
00:03:04,280 --> 00:03:09,960 | |
I think registered users for something, and now registered user count is many millions. | |
53 | |
00:03:09,960 --> 00:03:14,560 | |
I believe two million contributors have actually edited them to help. | |
54 | |
00:03:14,560 --> 00:03:19,520 | |
Watching these numbers grow in the past decade has been pretty incredible. | |
55 | |
00:03:19,520 --> 00:03:25,960 | |
It was very fortunate to be able to study this and eventually finish grad school looking | |
56 | |
00:03:25,960 --> 00:03:28,880 | |
at the evolution of OSM. | |
57 | |
00:03:28,880 --> 00:03:30,800 | |
So happy to be here today. | |
58 | |
00:03:30,800 --> 00:03:34,520 | |
Right at the start, we talked about obviously about openStreetMap. | |
59 | |
00:03:34,520 --> 00:03:39,000 | |
I really hope that we don't need to dive into the details of what that is for the people | |
60 | |
00:03:39,000 --> 00:03:42,680 | |
listening to this particular podcast, but you mentioned something else as well. | |
61 | |
00:03:42,680 --> 00:03:44,920 | |
You mentioned the daylight distribution. | |
62 | |
00:03:44,920 --> 00:03:48,160 | |
And I said, I think this is going to be an important piece in understanding over-true. | |
63 | |
00:03:48,160 --> 00:03:50,200 | |
So maybe we should start with daylight. | |
64 | |
00:03:50,200 --> 00:03:56,120 | |
Again, my understanding is that this is a downstream product of openStreetMap, but perhaps | |
65 | |
00:03:56,120 --> 00:03:58,880 | |
you could put some more words around that, of course, please. | |
66 | |
00:03:58,880 --> 00:03:59,880 | |
Yes. | |
67 | |
00:03:59,880 --> 00:04:07,680 | |
So the daylight map distribution is an open data product that, than Facebook, started producing | |
68 | |
00:04:07,680 --> 00:04:09,360 | |
in 2020. | |
69 | |
00:04:09,360 --> 00:04:14,240 | |
And so the gist of the daylight map distribution is exactly that. | |
70 | |
00:04:14,240 --> 00:04:20,160 | |
It's this downstream distribution that that terminology was chosen very specifically to | |
71 | |
00:04:20,160 --> 00:04:22,800 | |
consider think about like Linux distributions. | |
72 | |
00:04:22,800 --> 00:04:29,080 | |
And that it's not a copy, it's not a fork, it's a distribution of the openStreetMap database | |
73 | |
00:04:29,080 --> 00:04:32,760 | |
that has undergone a series of quality controls. | |
74 | |
00:04:32,760 --> 00:04:39,760 | |
And so kind of at a high level, what happens is each month, Neta, takes a snapshot of the | |
75 | |
00:04:39,760 --> 00:04:46,800 | |
openStreetMap planet and then runs it through a number of checks for coastline integrity, | |
76 | |
00:04:46,800 --> 00:04:52,920 | |
broken relations, fandalism, any of these potential issues on the map, and then spends | |
77 | |
00:04:52,920 --> 00:04:58,159 | |
the next four weeks addressing each of these issues and fixing them. | |
78 | |
00:04:58,160 --> 00:05:04,320 | |
And importantly, all of those fixes are made upstream, those fixes are made not on the | |
79 | |
00:05:04,320 --> 00:05:07,600 | |
daylight distribution, but rather in openStreetMap. | |
80 | |
00:05:07,600 --> 00:05:11,760 | |
And many times when you go to identify the error and go look it up, the community has | |
81 | |
00:05:11,760 --> 00:05:14,720 | |
actually already, has already fixed those errors. | |
82 | |
00:05:14,720 --> 00:05:18,960 | |
So it's either the data team that meta, that's looking at those errors or the community | |
83 | |
00:05:18,960 --> 00:05:21,920 | |
has already fixed them just because somebody else found them. | |
84 | |
00:05:21,920 --> 00:05:29,320 | |
And then those fixes are reinjusted and so at the end of a month, you end up with most | |
85 | |
00:05:29,320 --> 00:05:35,080 | |
of the data being about one month old, but anything that was identified as needing to | |
86 | |
00:05:35,080 --> 00:05:41,160 | |
be addressed has been updated to kind of the latest fixed or clean version. | |
87 | |
00:05:41,160 --> 00:05:47,720 | |
And so that is then released at the end of that month period as the current daylight map | |
88 | |
00:05:47,720 --> 00:05:49,000 | |
distribution. | |
89 | |
00:05:49,000 --> 00:05:54,000 | |
And so right now, the latest version that just came out was version 1.29. | |
90 | |
00:05:54,000 --> 00:06:00,000 | |
A couple other changes are kind of made in the data to make it a little bit more user friendly. | |
91 | |
00:06:00,000 --> 00:06:05,080 | |
Be normalized all the heights, for example, that's something that is important in openStreetMap. | |
92 | |
00:06:05,080 --> 00:06:10,000 | |
You can have many different accepted values for the height tag for say a building. | |
93 | |
00:06:10,000 --> 00:06:12,760 | |
You can have it in inches and feet and meters. | |
94 | |
00:06:12,760 --> 00:06:17,760 | |
And so we'll go ahead and just normalize all that and make that clean into just meters, for | |
95 | |
00:06:17,760 --> 00:06:18,760 | |
example. | |
96 | |
00:06:18,760 --> 00:06:22,880 | |
So a lot of these just little fixes and then at the end of the day, you have this kind | |
97 | |
00:06:22,880 --> 00:06:30,680 | |
of view of OSM that can be, you know, is kind of an enterprise ready version of OSM where | |
98 | |
00:06:30,680 --> 00:06:36,200 | |
you have this quality guarantee that it has been looked over by a data quality assurance | |
99 | |
00:06:36,200 --> 00:06:37,200 | |
team. | |
100 | |
00:06:37,200 --> 00:06:43,000 | |
And that is what, you know, we end up serving in our maps at meta as well as a number of other | |
101 | |
00:06:43,000 --> 00:06:44,719 | |
companies will ingest. | |
102 | |
00:06:44,720 --> 00:06:49,680 | |
And any importantly, you know, the data that map distribution is open and free for anybody | |
103 | |
00:06:49,680 --> 00:06:56,600 | |
could to go download that and use that as a version of a distribution of openStreetMap data | |
104 | |
00:06:56,600 --> 00:06:57,600 | |
into their products. | |
105 | |
00:06:57,600 --> 00:06:59,800 | |
Wow, that there was a fantastic overview. | |
106 | |
00:06:59,800 --> 00:07:00,800 | |
Thank you very much for that. | |
107 | |
00:07:00,800 --> 00:07:01,800 | |
Really appreciate it. | |
108 | |
00:07:01,800 --> 00:07:06,320 | |
So let's jump further now and talk about the over to a maps foundation. | |
109 | |
00:07:06,320 --> 00:07:12,520 | |
What it is, and then my hope is that we can pros and cons between the daylight distribution | |
110 | |
00:07:12,520 --> 00:07:18,340 | |
OSM itself and use that as a way of helping people understand what this is, what is the | |
111 | |
00:07:18,340 --> 00:07:24,240 | |
over to a maps foundation, who is it for, how is it different from these other huge data | |
112 | |
00:07:24,240 --> 00:07:25,240 | |
sets? | |
113 | |
00:07:25,240 --> 00:07:26,240 | |
Yes. | |
114 | |
00:07:26,240 --> 00:07:27,240 | |
Okay. | |
115 | |
00:07:27,240 --> 00:07:31,320 | |
So the obvious leading question here is, if you had to explain over to a maps foundation | |
116 | |
00:07:31,320 --> 00:07:33,000 | |
to me, how would you do that? | |
117 | |
00:07:33,000 --> 00:07:39,919 | |
So the over-term maps foundation is an open data project within the Linux foundation. | |
118 | |
00:07:39,920 --> 00:07:46,280 | |
The over-term aims to create easy to use and interoperable open map data for developers | |
119 | |
00:07:46,280 --> 00:07:51,120 | |
who build map services or use geospatial data generally. | |
120 | |
00:07:51,120 --> 00:07:56,840 | |
So this means bringing together all of the great open geospatial data sources out there | |
121 | |
00:07:56,840 --> 00:07:59,480 | |
of which openStreetMap is one. | |
122 | |
00:07:59,480 --> 00:08:03,080 | |
And I think the keyword here is the inner operability of this data. | |
123 | |
00:08:03,080 --> 00:08:07,920 | |
So finding ways to bring together multiple open data sources such as openStreetMap and | |
124 | |
00:08:07,920 --> 00:08:13,480 | |
then let's take another data source from one of the steering members of over-term is Microsoft. | |
125 | |
00:08:13,480 --> 00:08:16,080 | |
They have the Microsoft Building Footprints. | |
126 | |
00:08:16,080 --> 00:08:20,400 | |
That's the building footprints extracted from imagery all over the world. | |
127 | |
00:08:20,400 --> 00:08:25,240 | |
I think the data set in total has about 1.4 billion building footprints. | |
128 | |
00:08:25,240 --> 00:08:30,240 | |
That data can then be combined with openStreetMap building footprints. | |
129 | |
00:08:30,240 --> 00:08:35,720 | |
And now you end up with potentially the most complete open data distribution of building | |
130 | |
00:08:35,720 --> 00:08:36,720 | |
footprints. | |
131 | |
00:08:36,720 --> 00:08:41,520 | |
And so that's one example of two data sets coming together with an over-term apps. | |
132 | |
00:08:41,520 --> 00:08:47,080 | |
This is a fantastic example because are those same building footprints being slowly | |
133 | |
00:08:47,080 --> 00:08:49,400 | |
but surely ingested into openStreetMap. | |
134 | |
00:08:49,400 --> 00:08:56,120 | |
So my question is what problem is over to solving here compared to openStreetMap itself | |
135 | |
00:08:56,120 --> 00:08:57,720 | |
and the daylight distribution? | |
136 | |
00:08:57,720 --> 00:09:01,240 | |
How is it different from what's happening over in those other two places? | |
137 | |
00:09:01,240 --> 00:09:04,480 | |
Yes, that's a great question and fantastic segue. | |
138 | |
00:09:04,480 --> 00:09:11,120 | |
So those building footprints are also being slowly but surely ingested into openStreetMap | |
139 | |
00:09:11,120 --> 00:09:17,920 | |
one way that they can be ingested is via the rapid editor which shows you the | |
140 | |
00:09:17,920 --> 00:09:22,160 | |
outline of the footprint and allows a users to click accept. | |
141 | |
00:09:22,160 --> 00:09:27,440 | |
And that imports that particular footprint into the openStreetMap database. | |
142 | |
00:09:27,440 --> 00:09:33,840 | |
Now that's the important piece there is this footprint was the product of machine learning | |
143 | |
00:09:33,840 --> 00:09:40,800 | |
model and by a user clicking on it and saying accept it's now been human validated. | |
144 | |
00:09:40,800 --> 00:09:45,600 | |
And now it's been added to openStreetMap because the data on the openStreetMap database | |
145 | |
00:09:45,600 --> 00:09:52,800 | |
has a specific level of quality which is it has been created or curated by a mapper, | |
146 | |
00:09:52,800 --> 00:09:53,800 | |
right? | |
147 | |
00:09:53,800 --> 00:09:59,600 | |
OpenStreetMap is a community project and so by virtue of accepting that building that data | |
148 | |
00:09:59,600 --> 00:10:04,640 | |
has been validated and now that particular building is part of the openStreetMap data set | |
149 | |
00:10:04,640 --> 00:10:10,000 | |
as well as still exists in this Microsoft building dataset but we can use that signal to say | |
150 | |
00:10:10,000 --> 00:10:16,080 | |
actually this is now in OSM this is now in OSM building and we can then take that as it has | |
151 | |
00:10:16,080 --> 00:10:19,440 | |
been seen by a person and has been kind of validated. | |
152 | |
00:10:19,440 --> 00:10:26,400 | |
So an important kind of intermediate step there is that alongside the daylight map distribution, | |
153 | |
00:10:26,400 --> 00:10:33,199 | |
one thing that Microsoft does as a as a daylight partner is they take their building dataset | |
154 | |
00:10:33,199 --> 00:10:39,280 | |
and they compare it to every version of daylight and they'll go ahead and subtract out any | |
155 | |
00:10:39,280 --> 00:10:45,920 | |
building that already exists or overlaps of building that's already in OSM and they release that | |
156 | |
00:10:45,920 --> 00:10:51,439 | |
as what they call the sidecar file. So if you look at what daylight has historically done you | |
157 | |
00:10:51,439 --> 00:10:56,000 | |
have for each version of daylight you'll have the openStreetMap daylight distribution | |
158 | |
00:10:56,000 --> 00:11:02,880 | |
and then you'll have the sidecar Microsoft buildings file which as released as an OSM file and when | |
159 | |
00:11:02,880 --> 00:11:09,040 | |
you apply those two together you get all the buildings together you get this union of buildings | |
160 | |
00:11:09,040 --> 00:11:15,280 | |
where you're not going to have any Microsoft ML building kind of overlapping or overwriting an | |
161 | |
00:11:15,280 --> 00:11:20,560 | |
OSM building because we're making that decision that since it's in OSM it's been accepted and has | |
162 | |
00:11:20,560 --> 00:11:26,719 | |
this kind of human validated quality level to it which we're saying is is always going to be | |
163 | |
00:11:26,719 --> 00:11:32,319 | |
better than just an unvalidated version of a building from a machine learning model. | |
164 | |
00:11:32,319 --> 00:11:38,160 | |
Now that's still these two distinct datasets right to use those two in conjunction you have to go | |
165 | |
00:11:38,160 --> 00:11:42,800 | |
download and import all of the daylight data and then you have to apply all the Microsoft | |
166 | |
00:11:42,800 --> 00:11:47,520 | |
buildings to it and they still are going to look you know look different. | |
167 | |
00:11:47,520 --> 00:11:52,800 | |
Once still coming from openStreetMap one still is other data set and so over to it takes that | |
168 | |
00:11:52,800 --> 00:12:00,560 | |
one step further and says well let's go ahead and acknowledge that we want easier to use | |
169 | |
00:12:00,560 --> 00:12:06,880 | |
interoperable openMap data and we have these two building datasets we're going to produce | |
170 | |
00:12:06,880 --> 00:12:13,280 | |
a buildings theme with its own schema and we're going to find a way that we can map both the | |
171 | |
00:12:13,280 --> 00:12:19,760 | |
OSM data and the Microsoft building data into that and we're going to say that users who are | |
172 | |
00:12:19,760 --> 00:12:26,000 | |
looking for that combined openMap open building dataset at the end of the day are going to be | |
173 | |
00:12:26,000 --> 00:12:32,400 | |
able to use the building theme from overchure and have all those buildings together already | |
174 | |
00:12:32,400 --> 00:12:36,720 | |
conflated and they'll have they'll have the references to the OSM building and the | |
175 | |
00:12:36,720 --> 00:12:43,280 | |
reference to the Microsoft building all there and one theme called building with one height attribute | |
176 | |
00:12:43,280 --> 00:12:48,400 | |
that has been normalized and all of those different pieces can kind of come together under that | |
177 | |
00:12:48,400 --> 00:12:54,240 | |
one theme. So overchure is kind of the ultimate downstream distribution of the dataset we're | |
178 | |
00:12:54,240 --> 00:12:58,960 | |
bringing together all these different open data sets. Now one thing I didn't mention in there | |
179 | |
00:12:58,960 --> 00:13:05,680 | |
also included is height estimates from USGS LiDAR data that's been overlaid onto the building | |
180 | |
00:13:05,680 --> 00:13:11,280 | |
dataset so when there isn't wasn't an existing height in say OSM that LiDAR height has been | |
181 | |
00:13:11,280 --> 00:13:15,760 | |
has also been added and put in that so all these different ways we can we can start to combine | |
182 | |
00:13:15,760 --> 00:13:21,120 | |
it can play and create this you know one easy to use I want buildings I'm going to go look at the | |
183 | |
00:13:21,120 --> 00:13:26,959 | |
buildings theme that is what overture is producing at the end of the day. Wow that's again great | |
184 | |
00:13:26,959 --> 00:13:32,240 | |
overview thank you very much so let me try and summarize this open street map if we think about | |
185 | |
00:13:32,240 --> 00:13:37,920 | |
these massive building footprint data datasets that have been generated by by Microsoft and Google | |
186 | |
00:13:37,920 --> 00:13:42,560 | |
and probably others as well around the place the open data sets they are slowly but surely | |
187 | |
00:13:42,560 --> 00:13:46,640 | |
making the way into open street map but there's a human in the loop which is great right because | |
188 | |
00:13:46,640 --> 00:13:52,160 | |
it's human validated like you were saying before but it also means that the process is slow yes so | |
189 | |
00:13:52,160 --> 00:13:56,720 | |
we're limited by that in terms of speed of entry of data by the human in the loop the daylight | |
190 | |
00:13:56,720 --> 00:14:02,080 | |
distribution says okay the open street map validated housing layer that gets priority | |
191 | |
00:14:02,080 --> 00:14:07,440 | |
but it's important to remember that we have these other datasets so in order to try and provide | |
192 | |
00:14:07,440 --> 00:14:12,320 | |
a complete dataset if we talk about building footprints you can get two different files open street | |
193 | |
00:14:12,320 --> 00:14:17,760 | |
map version of it and all the buildings that don't exist in open street map is the signcast file | |
194 | |
00:14:17,760 --> 00:14:24,000 | |
the over to a map's way is hey we're going to combine all these things in one and normalize them | |
195 | |
00:14:24,000 --> 00:14:28,080 | |
and we're going to add some other things as well so heights for example if they are missing from | |
196 | |
00:14:28,080 --> 00:14:32,960 | |
the open street map data one thing we haven't talked about here is the schema how does that | |
197 | |
00:14:32,960 --> 00:14:38,560 | |
change is this also copied from open street map over into these other distributions or | |
198 | |
00:14:38,560 --> 00:14:43,200 | |
do these other distributions have their own schema yeah so I should also add you mentioned | |
199 | |
00:14:43,200 --> 00:14:49,440 | |
other data sources there as rebuildings are also included in overture so again another | |
200 | |
00:14:49,440 --> 00:14:53,840 | |
just one more dataset that's being conflated into those buildings this is data from the | |
201 | |
00:14:53,840 --> 00:15:00,480 | |
as re community maps data sets so when it comes to schema it does look fundamentally different | |
202 | |
00:15:00,480 --> 00:15:05,840 | |
from from open street map and this is this is by design right open street map has the flexible | |
203 | |
00:15:05,840 --> 00:15:11,440 | |
key value tagging model which is critical that's that's exactly as it as it should be it's been | |
204 | |
00:15:11,440 --> 00:15:17,120 | |
designed for for good reason and allows the maximum out of flexibility there's a process that | |
205 | |
00:15:17,120 --> 00:15:23,680 | |
anybody looking to build a map with open street map data has to go through which is to map those | |
206 | |
00:15:23,680 --> 00:15:32,079 | |
key values into something more rigid that they then can can style onto a map and so the overture | |
207 | |
00:15:32,079 --> 00:15:38,000 | |
map schema does exactly that and says okay this is going to be the rigid style the rigid schema | |
208 | |
00:15:38,000 --> 00:15:44,640 | |
for how somebody can render a map from this data so we're going to take for example the building | |
209 | |
00:15:44,640 --> 00:15:51,280 | |
key in open street map can have any number of values from building equals yes to house to residential | |
210 | |
00:15:51,280 --> 00:15:57,839 | |
to commercial industrial et cetera and in the overture data schema we're going to map that | |
211 | |
00:15:57,839 --> 00:16:03,280 | |
to a field called class and we're only going to have a handful of values in there such as residential | |
212 | |
00:16:03,280 --> 00:16:08,560 | |
commercial medical et cetera the kind of just higher level pieces rather than having | |
213 | |
00:16:09,280 --> 00:16:16,560 | |
garage or a shed or some of these other open ended attacking schemas so that's an example | |
214 | |
00:16:16,560 --> 00:16:21,920 | |
of how we're going into a more strict schema there same with height right height and open street | |
215 | |
00:16:21,920 --> 00:16:28,000 | |
map can take any any value but in the overture building schema it has to be a number and it | |
216 | |
00:16:28,000 --> 00:16:35,119 | |
has to be in meters and that's just letting that downstream consumer have access to all that valuable | |
217 | |
00:16:35,119 --> 00:16:39,599 | |
hidden information that came from OSM that we've got ahead and normalized and so they don't have to | |
218 | |
00:16:39,599 --> 00:16:45,280 | |
add that logic into their pipeline and this is what I think he what we mean by this interoperable | |
219 | |
00:16:45,280 --> 00:16:50,240 | |
open update is finding a way to bring in all these different sources and we'll go ahead and do | |
220 | |
00:16:50,240 --> 00:16:56,800 | |
those conversions for that consumer one thing I think we probably dumped over here was this idea | |
221 | |
00:16:56,800 --> 00:17:01,760 | |
that it made a lot of sense what you're saying before when there was one data layer that needed to | |
222 | |
00:17:01,760 --> 00:17:05,839 | |
be conflated with open street map though there was pretty easy to wrap my mind around which one | |
223 | |
00:17:05,839 --> 00:17:10,560 | |
has been human she would validate at that gets priority and then if there is nothing in our | |
224 | |
00:17:10,560 --> 00:17:15,359 | |
district map then then we take the other you know we take the other data source as being the next | |
225 | |
00:17:15,359 --> 00:17:20,319 | |
bits one or the the best one what do we do when there's three we talked about Microsoft we talked | |
226 | |
00:17:20,319 --> 00:17:25,119 | |
about Google we talked about isry you know staying with the building footprints example how do you | |
227 | |
00:17:25,119 --> 00:17:30,720 | |
you find the the best one how do you conflate that great question currently we're looking at a | |
228 | |
00:17:30,720 --> 00:17:35,600 | |
couple different approaches and again I'll speak specifically to just the buildings the building | |
229 | |
00:17:35,600 --> 00:17:40,480 | |
theme because that's that's what this is relevant to in the current model in the data release that we just | |
230 | |
00:17:40,480 --> 00:17:47,840 | |
had we conflated in the following order we took OSM as the best and then the asry community data | |
231 | |
00:17:47,840 --> 00:17:53,760 | |
sets and then the Microsoft buildings now we're looking at some other open building data sources | |
232 | |
00:17:54,399 --> 00:18:00,080 | |
Google has just released their open building data set as well and so we're also looking at | |
233 | |
00:18:00,080 --> 00:18:08,240 | |
how we can start to add that in this can get us into another key feature of overture mapped | |
234 | |
00:18:08,240 --> 00:18:12,879 | |
which is what we're calling the global entity reference system gurus is the is the acronym | |
235 | |
00:18:14,399 --> 00:18:20,480 | |
and this allows us to say that okay we have all these different data sets we're going to go ahead | |
236 | |
00:18:20,480 --> 00:18:26,159 | |
and do this you know spatial conflation is hard we want to make this easier for everyone so we're | |
237 | |
00:18:26,160 --> 00:18:32,880 | |
going to go ahead and define this this concept of this of an entity for a building for example | |
238 | |
00:18:32,880 --> 00:18:37,440 | |
and we're going to say okay for every building we're going to give it a unique ID it's going to be a | |
239 | |
00:18:37,440 --> 00:18:43,680 | |
stable ID this is something that hasn't necessarily been solved in the in the open open spatial | |
240 | |
00:18:43,920 --> 00:18:48,480 | |
data world so we're going to we're going to try to create this registry of entities this open | |
241 | |
00:18:48,480 --> 00:18:56,560 | |
registry that anybody can can use and this will allow us to say that this building with ID 1 2 3 is | |
242 | |
00:18:57,120 --> 00:19:00,480 | |
this particular building in open street map it's this particular building in the Microsoft | |
243 | |
00:19:00,480 --> 00:19:05,280 | |
data set it's this particular building in the Google data set and then we can do | |
244 | |
00:19:05,920 --> 00:19:12,960 | |
conflation based on based on this gurus ID as opposed trying to do a complex spatial conflation | |
245 | |
00:19:12,960 --> 00:19:17,360 | |
and this will allow us to make decisions like the Google buildings for example are released | |
246 | |
00:19:17,360 --> 00:19:22,399 | |
with a confidence value from their machine learning model or you can look at other attributes | |
247 | |
00:19:22,399 --> 00:19:29,840 | |
if you can combine features by this unique ID you can say oh I want the footprint from open street | |
248 | |
00:19:29,840 --> 00:19:35,679 | |
map but I want to pull in these attributes that as we notice about for these buildings and you can | |
249 | |
00:19:35,679 --> 00:19:42,719 | |
start to build up these different really rich features by comparing all of this spatial data across | |
250 | |
00:19:42,720 --> 00:19:50,400 | |
these open data sets with this gurus ID wow that is a really really big idea yeah it'll be pretty | |
251 | |
00:19:50,400 --> 00:19:55,280 | |
amazing if you manage to like I'm sure you're going to do it right but it doesn't sound easy I'm | |
252 | |
00:19:55,280 --> 00:19:59,360 | |
thinking about but what if there's an offset like how do you identify the same building and | |
253 | |
00:19:59,360 --> 00:20:04,800 | |
lots of different lots of different data sets a tree hanging over a building and that was | |
254 | |
00:20:04,800 --> 00:20:10,320 | |
imaged when you know with leaf on for example you might make that footprint look quite different | |
255 | |
00:20:10,320 --> 00:20:14,879 | |
to the same process being run when it was when the leaves were off the trees or the tree was | |
256 | |
00:20:14,879 --> 00:20:20,159 | |
cut down and that happened in a later dataset or something like that or I'm thinking about so | |
257 | |
00:20:20,159 --> 00:20:26,480 | |
I'm sitting in a room now in my house not attached the house but two and a half meters from the house | |
258 | |
00:20:26,480 --> 00:20:32,000 | |
is a garage a separate building what if there's entity of the house turns into two separate | |
259 | |
00:20:32,000 --> 00:20:37,200 | |
features in the future when I imagerying gets better when we have better algorithms that kind of thing | |
260 | |
00:20:37,200 --> 00:20:41,760 | |
how do we deal with that in terms of like what happens with that ID does it get split up is it | |
261 | |
00:20:41,760 --> 00:20:47,520 | |
become a apparent child relationship it doesn't sound like an easy problem to solve exactly it's | |
262 | |
00:20:47,520 --> 00:20:52,320 | |
not going to be an easy problem to solve but that's that's going to be an important problem to solve | |
263 | |
00:20:52,320 --> 00:20:58,800 | |
everything within gurus has to be stable and when it can't be stable meaning that over different | |
264 | |
00:20:58,800 --> 00:21:05,200 | |
releases the ID needs to stay the same for the same entity in the case where new data comes in that | |
265 | |
00:21:05,200 --> 00:21:10,240 | |
says oh no this is the house and this is a garage but previously the registry that the | |
266 | |
00:21:10,240 --> 00:21:16,720 | |
gurus registry only had one building there then we need to mark that next you know that next version | |
267 | |
00:21:16,720 --> 00:21:24,240 | |
of the registry needs to say actually this one ID turned into two IDs and that's something that | |
268 | |
00:21:24,240 --> 00:21:29,920 | |
we'll need to be able to model so that it's backwards compatible when someone goes and looks up and | |
269 | |
00:21:29,920 --> 00:21:36,640 | |
says I'm looking for this ID the power of gurus initially is going to be this like inflation and how | |
270 | |
00:21:36,640 --> 00:21:42,880 | |
people can get the data together now when the data's released the real power of gurus is that | |
271 | |
00:21:42,880 --> 00:21:51,760 | |
anybody can take their data set and match it to the overture data set via gurus so anybody can | |
272 | |
00:21:51,760 --> 00:21:58,560 | |
release a proprietary geospatial data set with gurus IDs so that anybody else wanting to consume | |
273 | |
00:21:58,560 --> 00:22:06,080 | |
that data can now instantly do this ID based inflation and you can have you can use this this open | |
274 | |
00:22:06,080 --> 00:22:13,280 | |
registry as a inflation tool for any third party data stream to match to it and so that's where | |
275 | |
00:22:13,280 --> 00:22:19,919 | |
gurus will look up and say oh I had this data set it needed to match to this gurus ID this gurus | |
276 | |
00:22:19,919 --> 00:22:24,879 | |
ID turned into these two buildings and that needs to be able to be resolved so these are some of the | |
277 | |
00:22:24,880 --> 00:22:30,880 | |
finer nuances of what it's going to take for a stable ID system to really be successful but we are | |
278 | |
00:22:30,880 --> 00:22:34,800 | |
we are really excited about what this is going to what this is going to look like and what this | |
279 | |
00:22:34,800 --> 00:22:41,360 | |
is going to enable. Yeah absolutely. Imagine being able to roll back in time you know and see | |
280 | |
00:22:41,360 --> 00:22:46,800 | |
how a feature is changed. My guess is you'll see these things consolidate over time so | |
281 | |
00:22:46,800 --> 00:22:52,160 | |
four or five different polygons maybe or slowly but surely merge into one more more accurate | |
282 | |
00:22:52,160 --> 00:22:57,200 | |
one because you would hope that we would all be sort of matching towards the truth that all these | |
283 | |
00:22:57,200 --> 00:23:02,240 | |
algorithms start to agree or data sets that degree that are actually the house the building | |
284 | |
00:23:02,240 --> 00:23:08,560 | |
looks like this. I think that would be fascinating. Agreed I look forward to when we can get to that | |
285 | |
00:23:08,560 --> 00:23:13,440 | |
get to that point I think that data is moving in that in that direction right it's only going to | |
286 | |
00:23:13,440 --> 00:23:19,360 | |
improve as better imagery comes out as best better models come out you know gurus will need to keep | |
287 | |
00:23:19,360 --> 00:23:25,439 | |
evolving to capture that and to be something accurate that really makes it valuable for | |
288 | |
00:23:25,439 --> 00:23:32,800 | |
external data sources to register to this kind of common registry of entities around the world. | |
289 | |
00:23:33,520 --> 00:23:38,560 | |
Okay well you've got your work cut out for you but you're starting off with a fantastic acronym. | |
290 | |
00:23:40,159 --> 00:23:44,639 | |
You're after a good start. So up until now we've been talking about building footprints and I | |
291 | |
00:23:44,639 --> 00:23:49,120 | |
think that's been a great example to sort of follow through and think about conflation think about | |
292 | |
00:23:49,120 --> 00:23:53,919 | |
these different data sets that are being produced how we're going to get them combined into one | |
293 | |
00:23:53,919 --> 00:24:00,399 | |
we talked about the gurus ID talked about how with building footprints as an example we could add | |
294 | |
00:24:00,399 --> 00:24:07,280 | |
elevation I mean it's interesting stuff I guess the next obvious question is what are the data sets | |
295 | |
00:24:07,280 --> 00:24:14,000 | |
or data layers data themes are currently available in an over to a map's. Yeah so there's four | |
296 | |
00:24:14,000 --> 00:24:20,320 | |
data themes currently available in the over to data set that was that was just released. So | |
297 | |
00:24:20,320 --> 00:24:27,440 | |
we talked about buildings and that is one data set that is using a 500 something million buildings | |
298 | |
00:24:27,440 --> 00:24:33,840 | |
from open street map another data set that is also a downstream derivative of a OSM is the | |
299 | |
00:24:33,840 --> 00:24:40,400 | |
transportation theme and I think this is this is another great example of of being a downstream | |
300 | |
00:24:40,400 --> 00:24:48,960 | |
slightly augmented version of OSM that is hopefully going to be very usable and again interoperable | |
301 | |
00:24:48,960 --> 00:24:55,120 | |
with other data systems so this is something that Tom Tom is also a member of over-trum apps | |
302 | |
00:24:55,120 --> 00:24:58,960 | |
and this is something that they've been working on is to take the transportation network | |
303 | |
00:24:58,960 --> 00:25:04,800 | |
from open street map the road network the all the nodes and ways and the process it and turn it | |
304 | |
00:25:04,800 --> 00:25:12,159 | |
into segments and connectors right so it's very similar to nodes and ways but what's happening here | |
305 | |
00:25:12,159 --> 00:25:18,720 | |
is that in open street map you might have a single road that is made up of multiple ways just | |
306 | |
00:25:18,720 --> 00:25:25,040 | |
because of how it was mapped but it really is just one road and having those two ways kind of | |
307 | |
00:25:25,040 --> 00:25:32,639 | |
connected end to end doesn't offer any improvements or any value to a routing network likewise | |
308 | |
00:25:32,640 --> 00:25:39,360 | |
when you join two roads together like at an intersection in OSM you don't necessarily there's no | |
309 | |
00:25:39,360 --> 00:25:45,600 | |
requirement you can have one road kind of join in the middle of another road and that node that | |
310 | |
00:25:45,600 --> 00:25:50,960 | |
they both share where they connect isn't necessarily elevated in anything special you need to | |
311 | |
00:25:51,520 --> 00:25:56,240 | |
you need to do a more of a complex look up to say that oh this is actually intersection | |
312 | |
00:25:56,240 --> 00:26:01,520 | |
node because it exists in both of these in both of these ways but that way that was kind of | |
313 | |
00:26:01,520 --> 00:26:06,639 | |
intersect it doesn't get split into multiple ways so if that makes sense so so what happens | |
314 | |
00:26:06,639 --> 00:26:13,280 | |
downstream when we create this transportation theme is all of those pieces are resolved those ways | |
315 | |
00:26:13,280 --> 00:26:18,560 | |
that don't necessarily need to be two ways because having them split doesn't offer any benefit | |
316 | |
00:26:18,560 --> 00:26:24,160 | |
become one segment and then anywhere where you do have data we do have two roads coming | |
317 | |
00:26:24,160 --> 00:26:29,360 | |
coming together where they really should split and create two segments without having to do that | |
318 | |
00:26:29,360 --> 00:26:34,879 | |
kind of complex traversal to recreate that network then the data will go ahead and split and | |
319 | |
00:26:34,879 --> 00:26:40,639 | |
create two segments and one connector and so at the end of it you look at something it's going to look | |
320 | |
00:26:40,639 --> 00:26:46,560 | |
exactly like the open street map transportation highway network but it's just going to have | |
321 | |
00:26:46,560 --> 00:26:51,439 | |
kind of slightly under the hood it's going to be slightly different in terms of which are the | |
322 | |
00:26:51,439 --> 00:26:55,760 | |
connectors and which are the segments and it's going to be optimized for ingestion into | |
323 | |
00:26:55,760 --> 00:27:02,480 | |
into routing systems so that's another another theme as well as then on top of that that's just | |
324 | |
00:27:02,480 --> 00:27:06,640 | |
the structure of the segments and the connectors and then there's a number of other attributes | |
325 | |
00:27:06,640 --> 00:27:13,120 | |
that hire being modeled differently and enabling linear referencing across the data as well so | |
326 | |
00:27:13,120 --> 00:27:19,360 | |
it is just that I like to think of it as like one level of downstream processing is being done | |
327 | |
00:27:19,360 --> 00:27:26,000 | |
within over a share to produce something that is very usable and kind of immediately consumable | |
328 | |
00:27:26,000 --> 00:27:31,760 | |
and we're trying to solve that step so that consumers don't have to do that themselves. | |
329 | |
00:27:31,760 --> 00:27:36,959 | |
Is there any sort of conflation going on here are we adding other data sets and because | |
330 | |
00:27:36,959 --> 00:27:43,280 | |
my understanding is that that meta for example produces a a routing layer that is ingested into | |
331 | |
00:27:43,280 --> 00:27:49,840 | |
open street map via the rapid editor. I'm not sure what you mean about that. Are we just talking | |
332 | |
00:27:49,840 --> 00:27:54,560 | |
about the data that's in open street map that you're you know simplifying it in the ways that | |
333 | |
00:27:54,560 --> 00:27:58,960 | |
you're described making it more easy to consume or are we also talking about in terms of the buildings | |
334 | |
00:27:58,960 --> 00:28:03,280 | |
for example you're doing normalizing or you're creating a new schema that was going to be | |
335 | |
00:28:03,280 --> 00:28:08,720 | |
more easily digestible for folks wanting to consume this data but you're also adding in data that | |
336 | |
00:28:08,720 --> 00:28:14,080 | |
wasn't present in open street map. So my question is are you also doing that in terms of the | |
337 | |
00:28:14,080 --> 00:28:19,520 | |
the transportation network or is this pure open street map data that you are simplifying and making | |
338 | |
00:28:19,520 --> 00:28:26,240 | |
more consumable for for uses? Gotcha and in the case currently for the transportation theme | |
339 | |
00:28:26,240 --> 00:28:32,000 | |
it is the full open street map transportation network currently that that theme is just is | |
340 | |
00:28:32,000 --> 00:28:39,120 | |
ingesting only OSM data but there is you know hopefully a plan for the future to add more more data | |
341 | |
00:28:39,120 --> 00:28:44,400 | |
on top of that and do that conflation and build out and build out that road network. Absolutely. | |
342 | |
00:28:44,400 --> 00:28:50,480 | |
Okay so we've got building footprints we've got transportation theme what else is going on in there. | |
343 | |
00:28:50,480 --> 00:28:55,440 | |
Yeah so those I wanted to cover those too because initially because those are the data sets that | |
344 | |
00:28:55,440 --> 00:29:03,760 | |
involve open street map data the next one that's exciting is a theme called places and this is a | |
345 | |
00:29:03,760 --> 00:29:11,040 | |
combination of public place information from meta and Microsoft and it turns into it's the data | |
346 | |
00:29:11,040 --> 00:29:18,400 | |
says about 60 million points globally of points of interest and places that has never been released | |
347 | |
00:29:18,400 --> 00:29:25,280 | |
as as open data before. So this is quite exciting for as in terms of an open data offering that | |
348 | |
00:29:25,280 --> 00:29:32,320 | |
it is 60 million new places all over the world. Wow yeah that's pretty cool that they are | |
349 | |
00:29:32,320 --> 00:29:37,920 | |
opening up the data like that that's great. So three layers is I don't want to say is that it | |
350 | |
00:29:37,920 --> 00:29:42,160 | |
because it sounds like an incredible amount of work but is it's that we're at today? | |
351 | |
00:29:42,160 --> 00:29:48,160 | |
There is one more there is the Appmines layer which is administrative boundaries and that is | |
352 | |
00:29:48,160 --> 00:29:54,879 | |
data currently from from Microsoft and Tom Tom have been working together on that layer | |
353 | |
00:29:54,880 --> 00:30:01,760 | |
and that currently includes admin levels for countries and states and hopefully will | |
354 | |
00:30:01,760 --> 00:30:06,640 | |
continue to grow. We should add that you know this was our initial data release just for these | |
355 | |
00:30:06,640 --> 00:30:12,560 | |
four themes and we put this out because earlier we had really released the schema and I think | |
356 | |
00:30:12,560 --> 00:30:18,240 | |
that was exciting and then we needed to get some some data out there for people to see and | |
357 | |
00:30:18,240 --> 00:30:23,840 | |
play with and that's what we have initially released and so these these themes will continue to | |
358 | |
00:30:23,840 --> 00:30:29,760 | |
grow and improve as future data releases happen. I've seen a little bit of chatter about this | |
359 | |
00:30:29,760 --> 00:30:35,199 | |
on Twitter or X but whatever people are calling it these days and some people are really excited | |
360 | |
00:30:35,199 --> 00:30:40,159 | |
about the over to a maps foundation. Great yeah more open data more better kind of thing | |
361 | |
00:30:40,159 --> 00:30:45,919 | |
it I completely get that but we've been talking about this idea of accessibility consumer ability | |
362 | |
00:30:45,919 --> 00:30:49,919 | |
access that kind of thing and I've heard a lot of people say you have to be a real data nerd | |
363 | |
00:30:49,920 --> 00:30:55,360 | |
to actually get your hands on the stuff that it's actually difficult to access but my | |
364 | |
00:30:55,360 --> 00:31:01,120 | |
my guess is here that there's probably a difference between being a map data provider and a | |
365 | |
00:31:01,120 --> 00:31:08,640 | |
map services provider and I'm wondering which one of these roles over to is trying to fill. | |
366 | |
00:31:08,640 --> 00:31:17,200 | |
Yeah that's a fantastic point so overchurch is a map data provider looking to create you know | |
367 | |
00:31:17,200 --> 00:31:26,320 | |
this interoperable open map data set and then that data is consumable by developers who want | |
368 | |
00:31:26,320 --> 00:31:33,920 | |
to build map services and this is I think a key a key distinction here where there's a lot of | |
369 | |
00:31:33,920 --> 00:31:41,280 | |
opinionated decisions that go into how you take data from from your raw data set into producing | |
370 | |
00:31:41,280 --> 00:31:46,080 | |
you know titles for example what data gets included what data gets excluded and what it's | |
371 | |
00:31:46,080 --> 00:31:52,399 | |
going to look like and overchurch is not trying to get into that space of saying what the map | |
372 | |
00:31:52,399 --> 00:31:58,159 | |
services should look like overchurch wants to be upstream of that and be providing data for any | |
373 | |
00:31:58,159 --> 00:32:04,720 | |
provider of map services to to consume and hopefully build better products with and so we've made | |
374 | |
00:32:04,720 --> 00:32:12,000 | |
the decision to release the data as par k files which is a cloud native format which means that | |
375 | |
00:32:12,000 --> 00:32:17,280 | |
you know when you go to the overchurch website there isn't a download all button and I think that | |
376 | |
00:32:17,280 --> 00:32:21,680 | |
people have become very used to seeing this I just want to download the whole data set and then | |
377 | |
00:32:21,680 --> 00:32:29,840 | |
do something with it well it's a really big data set and that doesn't make a lot of sense and it's | |
378 | |
00:32:29,840 --> 00:32:34,800 | |
funny because the same thing happens with the open street map plan at file right it is released as one | |
379 | |
00:32:34,800 --> 00:32:41,200 | |
big data set and it keeps growing and there are tools that you can use to ingest the entire | |
380 | |
00:32:41,200 --> 00:32:46,800 | |
planet file it's just requires bigger and bigger machines which then get into kind of cloud computing | |
381 | |
00:32:46,800 --> 00:32:52,640 | |
as well and we did release the very initial data set that was released as I talked about earlier | |
382 | |
00:32:52,640 --> 00:32:57,040 | |
was this we had you know there's daylight and then you have these sidecar files where you can | |
383 | |
00:32:57,040 --> 00:33:04,320 | |
add on these buildings and the initial data release from from overchurch what we did was add | |
384 | |
00:33:04,320 --> 00:33:10,640 | |
all those things together and made one giant pbf and said okay we've done this conflation | |
385 | |
00:33:10,640 --> 00:33:15,440 | |
of all these buildings and all this stuff put into one giant pbf it was like almost a hundred gigs | |
386 | |
00:33:15,440 --> 00:33:20,800 | |
and now you can go download it and read this into your systems and have all these buildings | |
387 | |
00:33:20,800 --> 00:33:25,280 | |
and all the data kind of pre-conflated and we heard back from people that said well now this | |
388 | |
00:33:25,280 --> 00:33:31,280 | |
files too big it's not actually that that usable anymore because we have to spin up you know | |
389 | |
00:33:31,280 --> 00:33:37,200 | |
there's much bigger machine in the cloud to try to process this and so we took that that feedback | |
390 | |
00:33:37,200 --> 00:33:44,240 | |
and with this with there we want to actually release these each of these four themes we've done it | |
391 | |
00:33:44,240 --> 00:33:50,320 | |
in a way where it's these parquet files that are as you said this cloud native technology that | |
392 | |
00:33:50,320 --> 00:33:57,120 | |
allows you to point any number of kind of big data systems at these files and poke and prod and inspect | |
393 | |
00:33:57,120 --> 00:34:05,440 | |
the data and extract the data in the view that makes sense for you right so one example is that | |
394 | |
00:34:05,440 --> 00:34:12,560 | |
these are just data sets sitting out on on S3 and on Azure blob storage as well so you can pick | |
395 | |
00:34:12,560 --> 00:34:18,639 | |
your kind of provider or you can point open source tools such as duct db you can point them right | |
396 | |
00:34:18,639 --> 00:34:24,240 | |
to the the data set and you can run it locally but it allows you to write a query that says I want | |
397 | |
00:34:24,240 --> 00:34:31,120 | |
to download the data in this bounding box I want these columns and I want to convert this string | |
398 | |
00:34:31,120 --> 00:34:36,799 | |
into into something else I want to it allows you to have just a lot more control over the data that | |
399 | |
00:34:36,799 --> 00:34:43,440 | |
you're actually getting that then you can feed into your into your services what was really exciting | |
400 | |
00:34:43,440 --> 00:34:46,639 | |
when this came out is you know that's that's fundamentally different than the way that data's | |
401 | |
00:34:46,639 --> 00:34:50,400 | |
been released in the past where people are are more used to pushing this download but and then | |
402 | |
00:34:50,400 --> 00:34:55,679 | |
so certainly like there was some some comments about that but pretty quickly and we're trying to | |
403 | |
00:34:55,679 --> 00:35:00,799 | |
put some examples out of how to get started with this or that and pretty quickly there was a number | |
404 | |
00:35:00,800 --> 00:35:06,720 | |
of blogs that were coming out of people saying you know okay initially this was strange but then | |
405 | |
00:35:06,720 --> 00:35:10,960 | |
I went and did this and here's how I did it and I was able to look at the data and get the data | |
406 | |
00:35:10,960 --> 00:35:14,720 | |
and do this with it and so there were there were blog posts that were coming out showing | |
407 | |
00:35:14,720 --> 00:35:19,920 | |
with your Amazon account you can then query the data download the CSV and load it right into | |
408 | |
00:35:19,920 --> 00:35:25,280 | |
QGIS that was really exciting to see I think that we have made the right decision in terms of | |
409 | |
00:35:25,280 --> 00:35:30,000 | |
releasing it in this cloud native format that doesn't make the sub-pinitated decisions that says this is | |
410 | |
00:35:30,000 --> 00:35:35,520 | |
what the data has to look like we are trying to offer it at kind of one level above that and say | |
411 | |
00:35:35,520 --> 00:35:41,280 | |
this is the schema we've done this this enlist to bring the data together and make it | |
412 | |
00:35:41,280 --> 00:35:47,680 | |
interoperable but now it's up to you to decide what pieces of the data you want to use and how | |
413 | |
00:35:47,680 --> 00:35:53,840 | |
you want to use them and for developers who have access to their own cloud infrastructure and they just | |
414 | |
00:35:53,840 --> 00:35:59,120 | |
want to go download all the parka files and load them locally and do their own thing with that | |
415 | |
00:35:59,120 --> 00:36:04,720 | |
absolutely that is still certainly an option so I think that we've tried to offer the most | |
416 | |
00:36:04,720 --> 00:36:11,200 | |
flexibility but certainly have heard a lot of feedback and folks that are looking for that download | |
417 | |
00:36:11,200 --> 00:36:16,480 | |
button but I think that we can I hope that we can put more examples together as well to show | |
418 | |
00:36:16,480 --> 00:36:21,920 | |
to show how well there isn't a download button you can actually do a lot more with the data | |
419 | |
00:36:21,920 --> 00:36:26,720 | |
and how it's being released now that makes a lot of sense also I think like choosing those | |
420 | |
00:36:26,720 --> 00:36:31,600 | |
cloud native for that cloud native format that you're talking about that must rightly | |
421 | |
00:36:31,600 --> 00:36:37,919 | |
simplify the infrastructure you need to maintain the data set as well. My guess is there's no | |
422 | |
00:36:37,919 --> 00:36:42,640 | |
service running in the background it's blob storage just sitting there waiting so your job is | |
423 | |
00:36:42,640 --> 00:36:48,560 | |
then to keep updating that their blob storage with with new files. Also after hearing you say that | |
424 | |
00:36:48,560 --> 00:36:54,720 | |
so you're clearly on the map data provider side not the map services provider side and it sounds | |
425 | |
00:36:54,720 --> 00:36:59,839 | |
like your customers or the people that you're building this for on the map services side so you | |
426 | |
00:36:59,839 --> 00:37:04,319 | |
are essentially building this for the geeks for the developers you know for the people that | |
427 | |
00:37:04,319 --> 00:37:09,759 | |
they want to build on top of what you're providing precisely well we've covered a lot of | |
428 | |
00:37:09,759 --> 00:37:16,319 | |
ground so I guess again continuing with this theme of obvious questions the mix but it's like | |
429 | |
00:37:16,319 --> 00:37:20,799 | |
what's so what's next though we've got these four layers I don't mean to trivialize that it sounds | |
430 | |
00:37:20,800 --> 00:37:25,600 | |
like it's been a lot of work just getting this far and it sounds like you've got a lot of work to | |
431 | |
00:37:25,600 --> 00:37:30,000 | |
do just with these four layers but my guess is also that you've got plans for this right you're | |
432 | |
00:37:30,000 --> 00:37:35,680 | |
going to grow this and it's going to develop over time so again without like trivializing the | |
433 | |
00:37:35,680 --> 00:37:41,680 | |
work you've already done what is next where we're going from here. Yeah so the big thing we have | |
434 | |
00:37:41,680 --> 00:37:46,640 | |
released is four layers and I talked about this global entity reference system and how great | |
435 | |
00:37:46,640 --> 00:37:52,240 | |
it's going to be so the next thing that we need to do is is actually implemented on these four | |
436 | |
00:37:52,240 --> 00:37:58,400 | |
layers so that you know in the coming data releases there will actually be a occurs ID that will be | |
437 | |
00:37:58,400 --> 00:38:06,160 | |
this stable ID for all of these roads and buildings and places and admin features that | |
438 | |
00:38:06,160 --> 00:38:12,480 | |
that we're putting out in these themes and in terms of the next data layers I think that | |
439 | |
00:38:12,480 --> 00:38:18,240 | |
that's going to be a decision that's you know going to be driven by what what people are after | |
440 | |
00:38:18,240 --> 00:38:24,320 | |
what consumers are looking for and what companies and what data sets join over cheer and | |
441 | |
00:38:24,320 --> 00:38:30,160 | |
become interested in the project and want to participate and want to bring their open data to the | |
442 | |
00:38:30,160 --> 00:38:36,080 | |
to the table and and add it to these to the system we also do want to make sure that | |
443 | |
00:38:36,640 --> 00:38:42,160 | |
one theme that we're looking at next is what is it take to make to actually make a map from the | |
444 | |
00:38:42,160 --> 00:38:47,520 | |
overtry data I think that's a big question you know if you want to build a base map from | |
445 | |
00:38:47,520 --> 00:38:55,040 | |
from overtry data right now you have there's a transportation layer, admins and buildings and places | |
446 | |
00:38:55,040 --> 00:39:00,319 | |
so you're off to a great start but you also need to know where the land is where the water is | |
447 | |
00:39:01,120 --> 00:39:08,240 | |
other types of land use and so we're looking to put together kind of a context theme that's going | |
448 | |
00:39:08,240 --> 00:39:12,879 | |
to help us create something that you can actually make a full base map out of so that's something | |
449 | |
00:39:12,879 --> 00:39:17,919 | |
we're also looking back at open street map is a great source of data for that from the | |
450 | |
00:39:17,919 --> 00:39:23,359 | |
the daylight distribution and so how can we bring some more of that data in in these separate | |
451 | |
00:39:23,359 --> 00:39:29,759 | |
discrete layers so that give consumers the option to to turn those those data themes on as well | |
452 | |
00:39:29,759 --> 00:39:35,040 | |
those data layers on so those are probably the two two big things that are immediately | |
453 | |
00:39:35,040 --> 00:39:39,920 | |
honored to do west wow also big things it's going to be really interesting the follow this project | |
454 | |
00:39:39,920 --> 00:39:47,040 | |
and and see where you we end up especially also watching to see who joins the foundation right | |
455 | |
00:39:47,040 --> 00:39:52,000 | |
to and see what they bring to it that's going to be a pretty interesting yeah just pretty | |
456 | |
00:39:52,000 --> 00:39:57,840 | |
interesting to follow along so our open street map that has a really passionate community behind it | |
457 | |
00:39:57,840 --> 00:40:03,040 | |
around it it's been going for a while now and it's growing like you mentioned at the start of the | |
458 | |
00:40:03,040 --> 00:40:10,080 | |
episode it's been any pushback of people worried that you are trying to steal open street maps thunder | |
459 | |
00:40:10,080 --> 00:40:17,120 | |
that you're going to somehow make them be irrelevant or that you're stealing the data anything like | |
460 | |
00:40:17,120 --> 00:40:22,400 | |
that I'm not doing a great job of formulating this question but but I hope you know where I'm going | |
461 | |
00:40:22,400 --> 00:40:28,800 | |
no I I totally understand I think that this is an important and important distinction to to get right | |
462 | |
00:40:28,800 --> 00:40:36,320 | |
and over trip maps is consuming open street map data and I think that the two together are very | |
463 | |
00:40:36,320 --> 00:40:44,640 | |
complementary in that open street map continues to be the source for community maintained data | |
464 | |
00:40:44,640 --> 00:40:50,800 | |
and a vibrant evolving community maintaining open map data over trip is as you said earlier you | |
465 | |
00:40:50,800 --> 00:40:56,960 | |
know downstream of open street map and continuing to consume that and so when there's open street | |
466 | |
00:40:56,960 --> 00:41:03,360 | |
map data inside of over trip if that data needs to be adjusted all the data in in over trip | |
467 | |
00:41:03,360 --> 00:41:08,000 | |
is coming from from somewhere and it's not it's not going to be fixed inside of over trip it needs | |
468 | |
00:41:08,000 --> 00:41:14,800 | |
to go be fixed or updated or augmented at the source so this is something where if there's | |
469 | |
00:41:14,800 --> 00:41:18,720 | |
something in over trip that's coming from open street map there's still this this got to be | |
470 | |
00:41:18,720 --> 00:41:25,280 | |
the feedback loop that goes all the way back to to OSM to OSM remains like the original data | |
471 | |
00:41:25,280 --> 00:41:30,320 | |
source the original community and I think we can think of over trip as this downstream part of it | |
472 | |
00:41:30,320 --> 00:41:37,680 | |
and as a result companies involved in over trip such as neta will still very much stay involved | |
473 | |
00:41:37,680 --> 00:41:43,200 | |
in in open street map and one example is you know supporting the rapid editor for open street | |
474 | |
00:41:43,200 --> 00:41:50,240 | |
map data in order to keep the high quality data coming into over trip from from open street map | |
475 | |
00:41:50,240 --> 00:41:56,160 | |
all of that data validation data editing that still is happening in open street map it's not | |
476 | |
00:41:56,160 --> 00:42:01,279 | |
happening somewhere else and so there's still to that degree I don't think it's stealing open street | |
477 | |
00:42:01,279 --> 00:42:06,399 | |
maps thunder in any way but rather maintaining a presence within the open street map community | |
478 | |
00:42:06,399 --> 00:42:12,000 | |
and and helping ensure that that open street map continues to to do what open street map does | |
479 | |
00:42:12,000 --> 00:42:19,279 | |
which is being open vibrant community supporting high quality geospatial open geospatial data | |
480 | |
00:42:19,280 --> 00:42:26,320 | |
at the same time over trip provides this kind of place where other data sets can get merged into | |
481 | |
00:42:26,320 --> 00:42:32,320 | |
the over-term apps data set such as these you know AI derived buildings or roads where | |
482 | |
00:42:32,960 --> 00:42:39,600 | |
maybe that's not where they belong is in in open street map right the data that historically | |
483 | |
00:42:39,600 --> 00:42:44,640 | |
has maybe like there's there are a lot of questions around whether AI data should be | |
484 | |
00:42:44,640 --> 00:42:49,359 | |
important into OSM and by a rapid we're saying we're not importing it but you're actually | |
485 | |
00:42:49,359 --> 00:42:53,440 | |
looking at it and doing this human and the blue validation of the data and then it becomes part | |
486 | |
00:42:53,440 --> 00:42:57,839 | |
of open street map because it has gone through that human and the blue validation but if somebody | |
487 | |
00:42:57,839 --> 00:43:03,759 | |
wants to add those two data sets together over-term is the place they can go to get that full | |
488 | |
00:43:03,759 --> 00:43:09,279 | |
kind of complete data set yeah like it makes a lot of sense if the start like when we first | |
489 | |
00:43:09,280 --> 00:43:14,560 | |
started talking about this I was thinking it wasn't clear to me that these two projects weren't | |
490 | |
00:43:14,560 --> 00:43:19,920 | |
in competition with each other but when you put it like that when you described the the process | |
491 | |
00:43:19,920 --> 00:43:25,760 | |
of conflating these different data sets especially that these AI generated data sets | |
492 | |
00:43:25,760 --> 00:43:30,720 | |
I can see where you're going and you can see why people want this I really can I can also | |
493 | |
00:43:30,720 --> 00:43:36,960 | |
see that if you're simplifying the schema making it easier to search this idea of the global | |
494 | |
00:43:36,960 --> 00:43:42,400 | |
features ID doesn't sound easy to implement but man if you do that they'll be incredibly powerful | |
495 | |
00:43:42,400 --> 00:43:48,000 | |
you're adding these extra extra attributes elevation we talked about that before my guess is | |
496 | |
00:43:48,000 --> 00:43:53,040 | |
still but there'll be other things along the way and giving people a place to put the data which | |
497 | |
00:43:53,040 --> 00:43:57,280 | |
they can't just go and dump it into the community right and overwrite all the work that's been | |
498 | |
00:43:57,280 --> 00:44:03,120 | |
done or yeah so I can totally see it from that perspective and again again again I think this | |
499 | |
00:44:03,120 --> 00:44:07,279 | |
is going to be a super interesting project to follow along with but we've been talking about | |
500 | |
00:44:07,279 --> 00:44:12,319 | |
this the overture for a while now personally I think you've done a great job of explaining what | |
501 | |
00:44:12,319 --> 00:44:16,880 | |
it is and walking us through the process how it works in terms of these different data themes | |
502 | |
00:44:16,880 --> 00:44:21,040 | |
that you've got in there and a little bit about what the future looks like what is the the biggest | |
503 | |
00:44:21,040 --> 00:44:25,759 | |
misunderstanding about this project I mean you clearly talked to a lot of different people in the | |
504 | |
00:44:25,759 --> 00:44:31,200 | |
mapping world what what is the bit that they don't immediately get where when you start talking | |
505 | |
00:44:31,200 --> 00:44:36,560 | |
to them about overture what is the question that a lot of people will ask you about this. | |
506 | |
00:44:36,560 --> 00:44:42,319 | |
That's a good question I think it was really enlightening to be at state of the map a US for | |
507 | |
00:44:42,319 --> 00:44:47,040 | |
example in Richmond most recently and talk to people about overture because I think you know there's | |
508 | |
00:44:47,040 --> 00:44:52,879 | |
a lot of excitement and there's been a lot of I think speculation once we get to talking about it | |
509 | |
00:44:52,879 --> 00:44:57,759 | |
and explaining I think again explaining that relationship between between open street map and | |
510 | |
00:44:57,760 --> 00:45:03,040 | |
and overture and showing that you know these these these two aren't existing in competition you're | |
511 | |
00:45:03,040 --> 00:45:09,040 | |
not going to go to overture maps and click edit and edit this like separate version that's not | |
512 | |
00:45:09,040 --> 00:45:15,600 | |
what overture maps is overture maps is this is this combined open data set that is as we said | |
513 | |
00:45:15,600 --> 00:45:20,480 | |
kind of downstream of of of open street map and open street map is going to continue to to remain | |
514 | |
00:45:20,480 --> 00:45:27,120 | |
the vibrant community supporting open geospatial data community maintained data and so | |
515 | |
00:45:27,120 --> 00:45:33,600 | |
it's just one of the many inputs into overture and I think that that distinction is very important | |
516 | |
00:45:33,600 --> 00:45:38,560 | |
to make and I think that as we as overture keeps producing data sets that will become more | |
517 | |
00:45:38,560 --> 00:45:45,920 | |
more obvious as to what what exactly can be done with overture maps data and and show that it is | |
518 | |
00:45:46,480 --> 00:45:52,000 | |
another distribution of open update of that includes high quality data from from open street | |
519 | |
00:45:52,000 --> 00:45:58,160 | |
map and that's really exciting and is hopefully going to give more credibility to to the value of | |
520 | |
00:45:58,160 --> 00:46:04,400 | |
open street map average of being included in this one you know larger project and combining data | |
521 | |
00:46:04,400 --> 00:46:09,280 | |
sets from other sources as well into that I think that that's really exciting at the end of | |
522 | |
00:46:09,280 --> 00:46:16,480 | |
the day this is all happening in the open this is a Linux foundation project and the goal here is to | |
523 | |
00:46:16,480 --> 00:46:24,000 | |
build an irreverible open map data for anyone building massive map services and needs enterprise | |
524 | |
00:46:24,000 --> 00:46:28,880 | |
quality map data. What Jennings I think probably this is a great place to round off the conversation | |
525 | |
00:46:28,880 --> 00:46:33,200 | |
thank you very much for your time thank you for showing up on the podcast again I really appreciate it | |
526 | |
00:46:33,200 --> 00:46:38,240 | |
always enjoy talking with you you have this sort of enthusiasm you know about these kinds of projects | |
527 | |
00:46:38,240 --> 00:46:42,560 | |
that I really appreciate so it's been a pleasure we've mentioned the name a bunch of times | |
528 | |
00:46:42,560 --> 00:46:46,880 | |
overture maps foundation to be linked to that in the show notes is there anywhere else you want to | |
529 | |
00:46:46,880 --> 00:46:52,320 | |
direct people to send them to if they want to learn more yeah I think that the overture maps | |
530 | |
00:46:52,320 --> 00:46:59,440 | |
website overturemaps.org has an FAQ and a button to learn more about how to become a member | |
531 | |
00:46:59,440 --> 00:47:04,799 | |
and how to get involved in shaping you know what the what the future of the foundation looks like | |
532 | |
00:47:04,799 --> 00:47:09,200 | |
and how you know this is how we're making these decisions around what the schema looks like | |
533 | |
00:47:09,200 --> 00:47:14,720 | |
is by all these you know the member companies coming together and and developing these schema | |
534 | |
00:47:14,720 --> 00:47:21,120 | |
and releasing this data so I think you can learn more and and also links to the documentation and | |
535 | |
00:47:21,120 --> 00:47:26,799 | |
to the schema are all available there on the website well thanks again Jennings I'll include | |
536 | |
00:47:26,799 --> 00:47:30,799 | |
those links in the show notes and I hope people take the time and check it out really appreciate | |
537 | |
00:47:30,799 --> 00:47:37,359 | |
it time thanks for showing up fantastic thank you so much so I really hope you enjoyed that episode | |
538 | |
00:47:37,360 --> 00:47:44,000 | |
with Jennings Anderson research scientist at meta talking about the the overture maps foundation | |
539 | |
00:47:44,000 --> 00:47:48,640 | |
so I mentioned right the start of this episode that Jennings has been on the podcast before | |
540 | |
00:47:48,640 --> 00:47:53,680 | |
that episode I believe was called open street map a community of communities it's worth checking | |
541 | |
00:47:53,680 --> 00:47:57,120 | |
out there'll be a link in the show notes and what's going to put a link in the show notes to | |
542 | |
00:47:57,120 --> 00:48:02,960 | |
another topic that that Jennings mentioned and that was the rapid editor so again this is the tool | |
543 | |
00:48:02,960 --> 00:48:09,200 | |
an open source tool developed and maintained by meta which helps people rapidly edit open street map | |
544 | |
00:48:09,200 --> 00:48:16,160 | |
it's designed as a way of primarily of getting AI generated data sets validated by humans in any | |
545 | |
00:48:16,160 --> 00:48:21,440 | |
and into open street map anyway it's worth checking up it's also worth mentioning | |
546 | |
00:48:21,440 --> 00:48:25,920 | |
with published a few episodes now around this idea of cloud native geospatial and this is | |
547 | |
00:48:25,920 --> 00:48:31,840 | |
important because the overture maps foundation is providing data in this cloud native format | |
548 | |
00:48:31,840 --> 00:48:38,720 | |
and this particular case it's in as packe files published a few episodes now around this idea of | |
549 | |
00:48:38,720 --> 00:48:43,520 | |
cloud native geospatial formats and if you're not entirely sure what they are they're going to | |
550 | |
00:48:43,520 --> 00:48:48,480 | |
become increasingly important going forward in there it's as well worth taking the time to to | |
551 | |
00:48:48,480 --> 00:48:53,840 | |
understand them so I'll put links to a couple relevant episodes in the show notes of this episode | |
552 | |
00:48:53,840 --> 00:48:57,760 | |
okay that's it for me that's it for this week's episode thank you very much for tuning in all the | |
553 | |
00:48:57,760 --> 00:49:02,080 | |
way to be and I'll be back again soon I hope that you take the time to join me then bye | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment