Listed on the registry of Open Data on AWS, the Daylight OpenStreetMap Parquet files contain the latest Daylight Map Distribution of OpenStreetMap in an analysis-ready format. This dataset is optimized for cloud-based queries with Amazon Athena, meaning anyone can access the entire dataset with SQL queries in the browser, without the need to download or access the files directly.
The Daylight Map Distribution of OpenStreetMap is always openly available for download in the standard OSM PBF format (find it at daylightmap.org). The parquet files, however, were first made available alongside Daylight release v1.9. They contain fully resolved geometries and additional metadata including areas, lengths, and quadkeys, not present in the PBF.
In total, the OSM features files contain nearly 1B features including 178M+ nodes, 816M+ ways, and 5M+ relations. These are all renderable features, meaning they have geometries that can be rendered on a map. The OSM features files therefore do not include untagged nodes or relations without basic geometries (such as turn restrictions). Untagged objects can also be found in this analysis-ready format in the OSM elements files, described in the comments.
The following CREATE TABLE
command will add the daylight_osm_features
table to your AWS data (i.e. Glue) catalog. For best performance, make sure your Athena console is running in the same region as the s3://daylight-openstreetmap
bucket: us-west-2
.
CREATE EXTERNAL TABLE `daylight_osm_features`(
`id` bigint,
`version` int,
`changeset` bigint,
`created_at` timestamp,
`tags` map<string,string>,
`wkt` string,
`min_lon` double,
`max_lon` double,
`min_lat` double,
`max_lat` double,
`quadkey` string,
`linear_meters` double,
`square_meters` double)
PARTITIONED BY (
`release` string,
`type` string)
STORED AS PARQUET
LOCATION
's3://daylight-openstreetmap/parquet/osm_features/'
Next, load the partitions by running: MSCK REPAIR TABLE daylight_osm_features
.
Remember to use WHERE release =
to query only one dataset (such as v1.10
or v1.9
). These are the release numbers of the Daylight Map Distribution. Omitting this WHERE
clause will result in querying duplicate features.
SELECT tags [ 'highway' ] AS highway_tag,
count(id) AS osm_features,
count(distinct(changeset)) AS total_changesets,
sum(linear_meters) / 1000 AS total_km
FROM daylight_osm_features
WHERE linear_meters > 0 AND tags['highway'] IS NOT NULL
AND release='v1.9'
GROUP BY tags [ 'highway' ]
ORDER BY total_km DESC
SELECT
tags['name'] as name,
ST_CENTROID(ST_GEOMETRYFROMTEXT(wkt)) as center,
tags['amenity'] as amenity
FROM daylight_osm_features
WHERE tags['amenity'] LIKE 'hospital'
AND release='v1.9'
Extrapolating on the Hospital Features query, here is a comaprison of all
amenity=hospital
(red) andamenity=ice_cream
(blue) features in OpenStreetMap. You can see that ice cream shops are far more prevalent on coasts around the world (shown almost outlining Western Europe), with high density of both features (shown in white) in major cities globally.Data (c) OpenStreetMap Contributors
Albeit a silly example, this query takes ~25 seconds and generated 16MB of data, demonstrating both the power and simplicity (fewer than 5 lines) of this approach to OSM data analysis.