Over the last two months, the OSM data sprint has turned into the OSM data 5K and we've gotten some pretty awesome stuff accomplished. I'd like to supply some relevant links and talk a little bit about what we've accomplished and learned these last two months.
Our major focus on OSM data quality stems from the use of OSM in MapBox Streets. In the early days of MB Streets we noticed major national and subnational boundaries missing from MB Streets at certain zoom levels. In OSM the borders are tagged inconsistently so it makes rendering them in tilemill a bit difficult. We tried to standardize this by adding appropriate boundary and admin level tags to areas where the boundaries existed but were tagged wrong.
We used Ian's map to identify places where there were missing boundaries in MB streets (yellow boundaries without green boundaries for subnational, purple without black for national).
A team of about 4 or 5 of us went to town, trying to correct borders everywhere in the globe where they might exist but be tagged incorrectly.
While most of this retagging has gone over in the OSM community fine, there was some major backlash in Russia and a few other countries about the added tags. While we consulted the OSM wiki for appropriate tags, some communities are fiercely independent and don't really like it when you go in changing entire naming schemas without consulting their listservs and forum's first. So the first take home is that you should always consult the local community before changing existing nodes, ways and relations.
The other major problem here is when editing relations, be certain to only edit the members of that relation that need to be edited. Some of us were using a tool that would allow you to download remaining members of the boundary relation. Unfortunately, in some states and departments, people have tagged nodes and ways as members of the border. When you change all members of that relation, you are adding unnecessary attributes to features that shouldn't have admin_level and boundary tags.
Take away: We are going to step on lots of toes if we are manipulating world boundary schema to fit our MBstreets styling purposes. Ideally, boundary tagging schema would be standardized worldwide, but for now we must figure out different workarounds to accomodate the boundary tagging variability. AJ is working on ways to more robustly style the boundaries without having to edit them in OSM.
After Foursquare launched it became clear that parts of the world like Brazil, Mexico, Turkey, Indonesia, Malaysia, and Thailand were incredibly popular in Foursquare and severely under mapped in OSM. Additionally college campuses, major international malls, and airports are especially popular in foursquare and ought to be available in MB streets.
After the launch we began mapping the cities that were identified by users as missing in MB streets. Additionally, Tom and others built out some really fantastic tools to nip it in the bud and find high 4sq-low OSM cities.
- OSM vs Google compare
- fixed with before after images - mapshot
- mbstreets with 4sq check ins - fourbot
- OSM street and 4sq check in density maps - fourboss (this seems to have broken -____-).
Over the last two months we've mapped around 30 cities worldwide, 20 college campuses, logged 394,743 nodes, 98,151 ways. This sounds like a lot but in reality its only about 0.03% of all the nodes in OSM. Ruben alone has done almost 220,000 nodes, about 54% of all the nodes we logged. Someone give him a high five.
Check out some of our backlog wikis
Throughout this process we've tried to locate partners around the globe that may have open source data that might contain streets to import and or street names that we can copy. In all of our research, this was a very challenging task, as it generally required google searching and networking in other languages. We worked on finding NYC building data, and road names for Brazil and Mexico, with mixed results.
Most of the issues are in licensing. This was true both for data we found for road names in Sao Paolo and buildings we found for NYC.
We did find open data for the province of Jalisco and are moving forward with an import into JOSM. It will be blogged about shortly and can be tracked through the public repo.
An easier and potentially better route forward may be to directly contact users in the area and blasting regional listservs to find individuals with local knowledge who can add these road names accurately, without consuming any of our resources. This has worked in Caracas, Istanbul, and Monterrey, Mexico, where a local taxi company has added road names themselves.
We've also experimented with walking papers and hosting remote mapping parties but I haven't been involved with this. Ian V, Alex, I welcome your feedback here.
Throughout this project we've had difficulty settling upon good ways to visualize our progress in large cities, especially those that already have a lot of existing roads, and features. We finally settled upon a node visualization that Tom created and ran with that.
Going forward, in a project like this, having a style guide or a standard form for visualizations that may be featured in a blog post would be key to have up front. This would save a lot of time for people down the road, taking unnecessary screen shots.
This has been a definite win for us. In the past couple weeks we've seen folks begin editing in JOSM and learn how to make great node visualizations. Some individuals have participated through github, and while its only a handful, it does take work off of our plates. Additionally its a great coms play, making us more accessible and increasing our visibility. Huzzah for crowd-sourcing.
We're doing this again with the new Jalisco Data import.
A major push we did was to add buildings, and paths to OSM for US college campuses. This is easy work to do with reliable imagery. Students love foursquare and the love checking in around the their campuses in specific buildings and quads. Its easy to trace these features but difficult to name them without help. To do this we contacted each university to try to ask permission to either just take building names from their online map or to get a shapefile import from them. The former was very successful and helpful in this process.
The wiki backlog of universities can be found here. The blogpost from mapbox.com can be found here.
We had great success in getting campuses to give us permission to use their online map for a cross reference when adding names to OSM. This is a big win for us and indicates that many folks in Geography departments, along with School administrators and people in the facilities departments are willing to part with data and names. In most cases they do not view this as proprietary info, and are even surprised that we would ask before copying the names.
Probably a third of the schools never responded to emails. Those who offered data (shapefiles, dbfs?!!?) took a long time to offer this data. As we have learned above with data imports, getting this data from shapefiles and ESRI geodatabases into JOSM can be a real headache.
Takeaway: Its better to trace campuses in 4 hours than spend 4 weeks of back and forth emails securing licenses and running com to secure SHP files that then need to be converted to .osm files in GDAL.
So we should probably work this into each post.
In conjunction with the building of sahelresponse.org, we attempted to put a bunch of small unmapped cities on the map, for a blog post ahead of the website launch.
The cities that we mapped are listed here.
While the idea of taking a part of the world in crisis and mapping it from your armchair sounds great, its actually infeasible at this point for much of the developing world. Even with coordinating with our many partners (HOT, GeoEye, etc) it is clear that acquisition of aerial imagery is a bit of a boondoggle and can't be relied on in a crunch. As a result only the capital cities of each Sahel country were added to the map, which is still a win, but not as cool as adding these large (50,000 - 100,000 population) parts of the world to OSM.
Some additional feedback from someone who has worked on OSM stuff in Burkina Faso is located [here].(https://github.com/mapbox/mapping/wiki/Feedback)
The other lesson learned here is that OSM doesn't have a great naming schema for residential, footway, dirt roads (and other slum characteristics). This is more of styling issue, but there could be a more robust naming schema for tagging features in slums. It is not surprising that these areas don't fit into a western road naming schema that expects roads to be well planned and spaced. Also building = shack or building = hut does not exist :( ......... yet.
I welcome any thoughts from the others who were involved in this project. In particular, I wasn't involved in anything concerning the changing license, data imports, and research. If anyone has thoughts on those issues, feel free to add them.