This ended up taking a bit longer than I'd hoped. I started out by exploring the data using some JavaScript mapping libraries, but started to realize I was going to need something with a little more heft. I turned to Ruby, and decided the RGeo gem was the right thing for the job. It's been some time since I've used Ruby regularly, so there was some time spent reacquainting myself with some language stuff. And just getting up to speed with a library I had never used before.
At the core of the problem is this: how close is each address point to a street line? I looked up the math involved, and decided calculating things was going to be too much trouble. Then I discovered a very relevant message on the RGeo-users mailing list, and the rest of the solution was structuring the data properly.
I created a helper function point_line_dist(point, line)
that returns the distance from a given point to a specific line (as defined by two end points). This function uses two low-level methods, project
and interpolate
, provided by the ffi-geos gem. The first method gives a distance along the line, to the point of intersection where you'd draw a perpendicular line from the point to the line. Then the second method converts that line-distance into the coordinates where that point of interesection appears. Then, the method returns the distance from that intersection point on the line to the original point.
def point_line_dist(point, line)
line = line.geometry.fg_geom
line_dist = line.project(point.fg_geom)
line_point_geom = line.interpolate(line_dist)
line_point = @factory.wrap_fg_geom(line_point_geom)
return line_point.distance(point)
end
This method was used to sort the centerlines from the geojson file in order to find the closest one for each point in the addresses json file. The first item in the sorted list was the closest line to the address point, and presto—we have a good way to pick a centerline for each point!
sortedlines = centerlines.sort { |a, b|
a_dist = point_line_dist(point, a)
b_dist = point_line_dist(point, b)
a_dist <=> b_dist
}
centerline = sortedlines.first
The rest of the script simply churns out the data structures with the associated properties appended to the address points, using RGeo's EntityFactory. I found using two different factories (one for regular RGeo, and another for GeoJSON) to be slightly unweidly. I also wasn't sure which ID property was the "canonical" one for the centerlines, and just chose the PhysicalID
number arbitrarily. It should be easy enough to use a different one (perhaps GenericID
?).
One problem I see in the resulting solution is that addresses right at the corner of two streets might easily get mis-associated with the wrong one. For example "499 West 12 Street" should probably be "499 West Street" given the patterns of the address numbers. This seems like a limitation that would be hard to improve programatically. (EDIT: I just thought of how this could work, you could just keep track of the address number sequence for the chosen street, and if the number doesn't fit, go with the second closest street.) The approach isn't perfect, but a good starting point for a bit more manual polishing.
In addition to the resulting GeoJSON file, I've also included a quick and dirty Leaflet mapping interface that I used to confirm my results. This file is a modified version of the leaflet.html file output from MapTiler.app, which I used to turn the GeoTIFF file into tiled PNG files. I also ran the GeoJSON through a validator to make sure things looked okay.