Skip to content

Instantly share code, notes, and snippets.

@akiatoji
Last active August 25, 2016 15:35
Show Gist options
  • Save akiatoji/1223678a00b591aeac97 to your computer and use it in GitHub Desktop.
Save akiatoji/1223678a00b591aeac97 to your computer and use it in GitHub Desktop.
Using PostGIS to determine DMA for a GeoHex

Decoding Nielsen DMA from GeoHex

  • Nielsen Designated Market Area (DMA) is a geographic region where the population can receive the same broadcasts (TV/Radio, Cable/Internet).

  • GeoHex is a world coordinate system expressed by hexagon of various sizes. It's often used in games due to its simple matching nature. It's also useful for proximity/coverage since a hexagon approximates a circular area.

This gist is an exercise in figuring out DMA from a geohex code using opensource tools and data.

How it works

  • First, the GeoHex center Lat/Lon coordinate is obtained.
  • This coordinate can then be used by GIS spatial query to see which DMA contains this Lat/Lon.

Get Nielsen DMA TopoJSON

Opensource version

https://raw.githubusercontent.com/akiatoji/nielsen-dma/master/nielsentopo.json

Convert to GeoJSON

  • Be sure to install TopoJSON tools.
  • Then convert to GeoJSON
npm -g install topojson
topojson-geojson -o /Users/Aki/Projs/metro/geo --precision 6 nielsentopo.json
  • (optional) Check GeoJSON has DMA features. This command shows all the DMAs in generated GeoJSON.
ogrinfo -ro geo/nielsen_dma.json OGRGeoJSON

Convert to shape and load into PostGIS

To load into PostGIS, we convert GeoJSON data to ESRI Shapefile first.

ogr2ogr -f "ESRI Shapefile" nielsen_dma.shp geo/nielsen_dma.json OGRGeoJSON

We now have shape files. On OS X, you can verify the shapes using tools like GISLook at http://cartography.oregonstate.edu/GISLook.html

Next, turn ESRI shape into SQL command, then run against PostGIS. Database metros needs to exist first, of course.

shp2pgsql -I -D nielsen_dma > geo.sql
psql -d metros -f geo.sql 

DMA shapes along with medata data are loaded into PostGIS at this point in a table called nielsen_dma. I renamed this to dma for subsequent query.

Query against DMA

Now you can query to see which DMA a lat/lon belongs to like this:

select dma from dma where st_contains(geom, ST_GeomFromText( 'POINT(-84.333513 33.9254)')) = 't';

This returns DMA like so

 dma 
-----
 524

Voila!

Bulk query against a set of LatLons from GeoHex

If you have more than a few points to check, it's best to:

  • Load points data into a table with point already stored in geometry column
  • Then run query by joining with DMA table

Here is one example. This ruby code reads from STDIN and looks for GeoHex info in a string like gh=XX1234567. It then decodes the GeoHex into latlon, then stores into PostGIS table along with met value.

#!/usr/bin/env ruby

require 'pg'
require 'geohex'

conn=PGconn.connect(hostaddr: '127.0.0.1', port: 5432, dbname: 'metros', user: 'Aki')

conn.exec('drop table latlons') rescue nil
conn.exec('create table latlons( id serial, gh varchar(20), lat numeric(24, 20), lon numeric(24, 20) )')
conn.exec("select AddGeometryColumn('latlons', 'pt', 0, 'POINT', 2)")

ARGF.each do |line|

  met, gh = line.match(/gh=(.*?)&/).captures

  if gh and gh != 'null' then
    zone = Geohex::V3::Zone.decode gh

    begin
    conn.exec("insert into latlons(gh, lat, lon, pt)
            values( '#{gh}', '#{zone.latitude}', '#{zone.longitude}', ST_MakePoint(#{zone.longitude}, #{zone.latitude})
                  )")
    rescue => e
      puts e.message
    end
  end
end

Once data is loaded in latlons table, querying is fairly easy.

select l.gh, dma.dma from latlons l, dma d, where ST_Contains(d.geom, l.pt) = 't';

Above query lists all GH's from latlons table acoompanied by its DMA. Note, since the query is dynamic using geometry, if we were to change DMA geometry (i.e. cleanup bad shapes), it'll be reflected in the query right away.

Notes and random thoughts

  • DMA TopoJSON is not super accurate, with small gaps and overlaps in borders. Theoretically if latlon is right around the border, it can end up in different/wrong DMA or in no DMA at all. It's best to validate decoded DMA.
  • Along that line, you can probably filter out latlons that are close to the edges of DMA in PostGIS.
  • Probably should use SRID's on geometries if I can just figure out what SRID is.
  • GIS visualization using this data might be very useful.
  • Ruby code can run a lot faster with bulk transaction (~3800 rows/s on 2013 Retina MBP15).
  • Having said that, about 50M latlons was all I could do before query got horribly slow no problem once I put right gist indexes in PostGIS.
  • MySQL 5.6 spatial extension is supposed to do 5x performance for the type of query here, opening up the possibility for querying 100's of megs of rows.
  • MongoDB has had within() function since 2.4, but apparently this is not as fast as PostGIS. MongoDB, however, does 'Radius from point' searches really well (i.e. FourSquare), so if additional nearby queries are needed, MongoDB probably will make a good platform.
  • We can store actual geohex shapes and run intersect queries against DMA shapes. This might be useful if you are dealing with large enough geohexes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment