Skip to content

Instantly share code, notes, and snippets.

@dphiffer
Last active January 22, 2023 05:47
Show Gist options
  • Save dphiffer/0634fd3f0a68e1f870b2 to your computer and use it in GitHub Desktop.
Save dphiffer/0634fd3f0a68e1f870b2 to your computer and use it in GitHub Desktop.
NYPL Labs Space/Time Engineer code test (I did not get an offer)

Dan Phiffer's Space/Time Exercise

This ended up taking a bit longer than I'd hoped. I started out by exploring the data using some JavaScript mapping libraries, but started to realize I was going to need something with a little more heft. I turned to Ruby, and decided the RGeo gem was the right thing for the job. It's been some time since I've used Ruby regularly, so there was some time spent reacquainting myself with some language stuff. And just getting up to speed with a library I had never used before.

At the core of the problem is this: how close is each address point to a street line? I looked up the math involved, and decided calculating things was going to be too much trouble. Then I discovered a very relevant message on the RGeo-users mailing list, and the rest of the solution was structuring the data properly.

I created a helper function point_line_dist(point, line) that returns the distance from a given point to a specific line (as defined by two end points). This function uses two low-level methods, project and interpolate, provided by the ffi-geos gem. The first method gives a distance along the line, to the point of intersection where you'd draw a perpendicular line from the point to the line. Then the second method converts that line-distance into the coordinates where that point of interesection appears. Then, the method returns the distance from that intersection point on the line to the original point.

def point_line_dist(point, line)
	line = line.geometry.fg_geom
	line_dist = line.project(point.fg_geom)
	line_point_geom = line.interpolate(line_dist)
	line_point = @factory.wrap_fg_geom(line_point_geom)
	return line_point.distance(point)
end

This method was used to sort the centerlines from the geojson file in order to find the closest one for each point in the addresses json file. The first item in the sorted list was the closest line to the address point, and presto—we have a good way to pick a centerline for each point!

sortedlines = centerlines.sort { |a, b|
	a_dist = point_line_dist(point, a)
	b_dist = point_line_dist(point, b)
	a_dist <=> b_dist
}
centerline = sortedlines.first

The rest of the script simply churns out the data structures with the associated properties appended to the address points, using RGeo's EntityFactory. I found using two different factories (one for regular RGeo, and another for GeoJSON) to be slightly unweidly. I also wasn't sure which ID property was the "canonical" one for the centerlines, and just chose the PhysicalID number arbitrarily. It should be easy enough to use a different one (perhaps GenericID?).

One problem I see in the resulting solution is that addresses right at the corner of two streets might easily get mis-associated with the wrong one. For example "499 West 12 Street" should probably be "499 West Street" given the patterns of the address numbers. This seems like a limitation that would be hard to improve programatically. (EDIT: I just thought of how this could work, you could just keep track of the address number sequence for the chosen street, and if the number doesn't fit, go with the second closest street.) The approach isn't perfect, but a good starting point for a bit more manual polishing.

In addition to the resulting GeoJSON file, I've also included a quick and dirty Leaflet mapping interface that I used to confirm my results. This file is a modified version of the leaflet.html file output from MapTiler.app, which I used to turn the GeoTIFF file into tiled PNG files. I also ran the GeoJSON through a validator to make sure things looked okay.

require 'rgeo/geo_json'
require 'ffi-geos'
# Usage: ruby conflate.rb > conflated.geojson
@factory = RGeo::Geos.factory(:native_interface => :ffi)
entity_factory = RGeo::GeoJSON::EntityFactory.instance
# This was helpful: https://groups.google.com/forum/#!topic/rgeo-users/e1FgzpPISs8
def point_line_dist(point, line)
line = line.geometry.fg_geom
line_dist = line.project(point.fg_geom)
line_point_geom = line.interpolate(line_dist)
line_point = @factory.wrap_fg_geom(line_point_geom)
return line_point.distance(point)
end
centerlines_geojson = File.read('west_village_centerlines.geojson')
centerlines = RGeo::GeoJSON.decode(centerlines_geojson, {
:json_parser => :json,
:geo_factory => @factory
})
addresses_json = File.read('data/addresses.json')
addresses = JSON.parse(addresses_json)
features = []
addresses.each { |address|
data = address["addresses"].first
point = @factory.point(
data["longitude"],
data["latitude"]
)
sortedlines = centerlines.sort { |a, b|
a_dist = point_line_dist(point, a)
b_dist = point_line_dist(point, b)
a_dist <=> b_dist
}
centerline = sortedlines.first
centerline_id = centerline.properties["PhysicalID"]
street = centerline.properties["Street"]
features.push(entity_factory.feature(point, address["id"], {
"centerline" => centerline_id,
"number" => data["address"],
"street" => street,
"address" => "#{data["address"]} #{street}"
}))
}
centerlines.each { |centerline|
centerline_id = centerline.properties["PhysicalID"]
feature = entity_factory.feature(centerline.geometry, centerline_id, centerline.properties)
features.push(feature)
}
collection = entity_factory.feature_collection(features)
geojson = RGeo::GeoJSON.encode(collection)
puts(geojson.to_json)
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
<!DOCTYPE html>
<html>
<head>
<title>Verification</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.6/leaflet.css" />
<script src="//code.jquery.com/jquery-1.11.3.min.js"></script>
<script src="http://cdn.leafletjs.com/leaflet-0.6/leaflet.js"></script>
<script>
function init() {
var bounds = new L.LatLngBounds(
new L.LatLng(40.735298, -74.011990),
new L.LatLng(40.741829, -74.004839));
var map = L.map('map').fitBounds(bounds);
L.tileLayer('http://api.tiles.mapbox.com/v3/examples.map-zr0njcqy/{z}/{x}/{y}.png').addTo(map);
var options = {
minZoom: 15,
maxZoom: 19,
opacity: 1.0,
tms: false
};
L.tileLayer('{z}/{x}/{y}.png', options).addTo(map);
var west_village = new L.geoJson();
west_village.addTo(map);
var streets = [];
$.ajax({
dataType: "json",
url: "conflated.geojson",
success: function(data) {
$(data.features).each(function(key, data) {
if (data.geometry.type == 'LineString') {
west_village.addData(data);
} else {
// lat/lng vs. lng/lat, why can't we just agree one one!
L.marker([data.geometry.coordinates[1], data.geometry.coordinates[0]], {
title: data.properties.address
}).addTo(map);
}
});
}
}).error(function() {
console.error('Error loading geojson');
});
}
</script>
<style>
body { margin:0; padding:0; }
#map { position:absolute; top:0; bottom:0; width:100%; }
</style>
</head>
<body onload="init()">
<div id="map"></div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment