Skip to content

Instantly share code, notes, and snippets.

@cloventt
Created March 15, 2025 00:00
Show Gist options
  • Save cloventt/2426df594f0f814ff421089c54ae68e6 to your computer and use it in GitHub Desktop.
Save cloventt/2426df594f0f814ff421089c54ae68e6 to your computer and use it in GitHub Desktop.
OpenRefine convert geoJSON polygons to coordinates

OpenRefine geoJSON Column to Coordinates

I had a dataset in OpenRefine that included a column of geoJSON objects as strings. Each geoJSON object represented a public park. I wanted to import the collection of parks to Wikidata, so I needed to convert the geoJSON polygon data into a central "point" that could be put into the P625 coordinate location field.

NB: your geoJSON should have coordinates already in the WGS84 coordinate system (this is the standard for geoJSON).

To use this in OpenRefine:

  1. Click the down-arrow at the top of your raw geoJSON column.
  2. Select Edit Column -> Add column based on this column...
  3. Choose a name for your new column.
  4. Change the Expression language dropdown to Python / Jython.
  5. Paste in the code from script.py
  6. You should see pairs of latitude/longitude in the preview.
  7. Click OK to create the new column.

If your object is a MultiPolygon, the centroid from the polygon with the largest bounding box will be chosen. In this way, your resulting coordinate will always be inside one of the polygons.

(There's probably a better way to do this but this worked well).

import json
data = json.loads(value)
def compute_min_max(ring):
'''Quick way to compute the min/max of each dimension in a polygon'''
min_lat = 300
max_lat = -300
min_lon = 300
max_lon = -300
for coord in ring:
[lat, lon] = coord
if lat > max_lat:
max_lat = lat
if lat < min_lat:
min_lat = lat
if lon > max_lon:
max_lon = lon
if lon < min_lon:
min_lon = lon
return ((min_lat, max_lat), (min_lon, max_lon))
def compute_centroid(min_max):
'''Compute the centroid of the polygon'''
(lats, lons) = min_max
(min_lat, max_lat) = lats
(min_lon, max_lon) = lons
return round((min_lat + max_lat) / 2, 4), round((min_lon + max_lon) / 2, 4)
def compute_bounding_box_area(min_max):
'''Quick way to compute the bounding box area.
This doesn't get you the coordinates of the bounding box, just the area.
Used for ordering segments of multipolygons by their relative extent.'''
(lats, lons) = min_max
(min_lat, max_lat) = lats
(min_lon, max_lon) = lons
return (max_lat - min_lat) * (max_lon - min_lon)
centroids = []
rings = [data["features"][0]["geometry"]["coordinates"][0]] if data["features"][0]["geometry"]["type"] == "Polygon" else list(map(lambda p: p[0], data["features"][0]["geometry"]["coordinates"]))
min_maxs = map(compute_min_max, rings)
centroids = map(compute_centroid, min_maxs)
areas = map(compute_bounding_box_area, min_maxs)
return ','.join(map(lambda c: str(c), centroids[areas.index(max(areas))]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment