-
Nielsen Designated Market Area (DMA) is a geographic region where the population can receive the same broadcasts (TV/Radio, Cable/Internet).
-
GeoHex is a world coordinate system expressed by hexagon of various sizes. It's often used in games due to its simple matching nature. It's also useful for proximity/coverage since a hexagon approximates a circular area.
This gist is an exercise in figuring out DMA from a geohex code using opensource tools and data.
- First, the GeoHex center Lat/Lon coordinate is obtained.
- This coordinate can then be used by GIS spatial query to see which DMA contains this Lat/Lon.
Opensource version
https://raw.githubusercontent.com/akiatoji/nielsen-dma/master/nielsentopo.json
- Be sure to install TopoJSON tools.
- Then convert to GeoJSON
npm -g install topojson
topojson-geojson -o /Users/Aki/Projs/metro/geo --precision 6 nielsentopo.json
- (optional) Check GeoJSON has DMA features. This command shows all the DMAs in generated GeoJSON.
ogrinfo -ro geo/nielsen_dma.json OGRGeoJSON
To load into PostGIS, we convert GeoJSON data to ESRI Shapefile first.
ogr2ogr -f "ESRI Shapefile" nielsen_dma.shp geo/nielsen_dma.json OGRGeoJSON
We now have shape files. On OS X, you can verify the shapes using tools like GISLook at http://cartography.oregonstate.edu/GISLook.html
Next, turn ESRI shape into SQL command, then run against PostGIS. Database metros
needs to exist first, of course.
shp2pgsql -I -D nielsen_dma > geo.sql
psql -d metros -f geo.sql
DMA shapes along with medata data are loaded into PostGIS at this point in a table called nielsen_dma
.
I renamed this to dma
for subsequent query.
Now you can query to see which DMA a lat/lon belongs to like this:
select dma from dma where st_contains(geom, ST_GeomFromText( 'POINT(-84.333513 33.9254)')) = 't';
This returns DMA like so
dma
-----
524
Voila!
If you have more than a few points to check, it's best to:
- Load points data into a table with point already stored in geometry column
- Then run query by joining with DMA table
Here is one example. This ruby code reads from STDIN and looks for GeoHex info in a string like gh=XX1234567
. It then decodes the GeoHex into latlon, then stores into PostGIS table along with met value.
#!/usr/bin/env ruby
require 'pg'
require 'geohex'
conn=PGconn.connect(hostaddr: '127.0.0.1', port: 5432, dbname: 'metros', user: 'Aki')
conn.exec('drop table latlons') rescue nil
conn.exec('create table latlons( id serial, gh varchar(20), lat numeric(24, 20), lon numeric(24, 20) )')
conn.exec("select AddGeometryColumn('latlons', 'pt', 0, 'POINT', 2)")
ARGF.each do |line|
met, gh = line.match(/gh=(.*?)&/).captures
if gh and gh != 'null' then
zone = Geohex::V3::Zone.decode gh
begin
conn.exec("insert into latlons(gh, lat, lon, pt)
values( '#{gh}', '#{zone.latitude}', '#{zone.longitude}', ST_MakePoint(#{zone.longitude}, #{zone.latitude})
)")
rescue => e
puts e.message
end
end
end
Once data is loaded in latlons table, querying is fairly easy.
select l.gh, dma.dma from latlons l, dma d, where ST_Contains(d.geom, l.pt) = 't';
Above query lists all GH's from latlons table acoompanied by its DMA. Note, since the query is dynamic using geometry, if we were to change DMA geometry (i.e. cleanup bad shapes), it'll be reflected in the query right away.
- DMA TopoJSON is not super accurate, with small gaps and overlaps in borders. Theoretically if latlon is right around the border, it can end up in different/wrong DMA or in no DMA at all. It's best to validate decoded DMA.
- Along that line, you can probably filter out latlons that are close to the edges of DMA in PostGIS.
- Probably should use SRID's on geometries if I can just figure out what SRID is.
- GIS visualization using this data might be very useful.
- Ruby code can run a lot faster with bulk transaction (~3800 rows/s on 2013 Retina MBP15).
- Having said that, about 50M latlons was
all I could do before query got horribly slowno problem once I put right gist indexes in PostGIS. - MySQL 5.6 spatial extension is supposed to do 5x performance for the type of query here, opening up the possibility for querying 100's of megs of rows.
- MongoDB has had within() function since 2.4, but apparently this is not as fast as PostGIS. MongoDB, however, does 'Radius from point' searches really well (i.e. FourSquare), so if additional nearby queries are needed, MongoDB probably will make a good platform.
- We can store actual geohex shapes and run intersect queries against DMA shapes. This might be useful if you are dealing with large enough geohexes.