Skip to content

Instantly share code, notes, and snippets.

@sgillies
Last active October 30, 2024 10:58
Show Gist options
  • Save sgillies/2217756 to your computer and use it in GitHub Desktop.
Save sgillies/2217756 to your computer and use it in GitHub Desktop.
A Python Protocol for Geospatial Data

Author: Sean Gillies Version: 1.0

Abstract

This document describes a GeoJSON-like protocol for geo-spatial (GIS) vector data.

Introduction

Python has a number of built-in protocols (descriptors, iterators, etc). A very simple and familiar one involves string representations of objects. The built-in str() function calls the __str__() method of its single argument. By implementing __str__(), instances of any class can be printed by any other Python program.

>>> class A(object):
...     def __str__(self):
...         return "Eh!"
...
>>> a = A()
>>> str(a)
'Eh!'
>>> "%s" % a
'Eh!'

What if we could do something like this for geo-spatial objects? It might, for example, let any object be analyzed using any other hypothetical software package like this:

>>> from some_analytic_module import as_geometry
>>> as_geometry(obj).buffer(1.0).area   # obj is a "point" of some kind
3.1365484905459389

The hypothetical as_geometry() function of the hypothetical some_analytic_module module would access relevant data of its single argument using an agreed upon method or attribute.

__geo_interface__

Following the lead of numpy's Array Interface [1], let's agree on a __geo_interface__ property. To avoid creating even more protocols, let's make the value of this attribute a Python mapping. To further minimize invention, let's borrow from the GeoJSON format [2] for the structure of this mapping.

The keys are:

type (required)
A string indicating the geospatial type. Possible values are "Feature" or a geometry type: "Point", "LineString", "Polygon", etc.
bbox (optional)
A tuple of floats that describes the geo-spatial bounds of the object: (left, bottom, right, top) or (west, south, east, north).
properties (optional)
A mapping of feature properties (labels, populations ... you name it. Dependent on the data). Valid for "Feature" types only.
geometry (optional)
The geometric object of a "Feature" type, also as a mapping.
coordinates (required)
Valid only for geometry types. This is an (x, y) or (longitude, latitude) tuple in the case of a "Point", a list of such tuples in the "LineString" case, or a list of lists in the "Polygon" case. See the GeoJSON spec for details.

Examples

First, a toy class with a point representation:

>>> class Pointy(object):
...     __geo_interface__ = {'type': 'Point', 'coordinates': (0.0, 0.0)}
...
>>> as_geometry(Pointy()).buffer(1.0).area
3.1365484905459389

Next, a toy class with a feature representation:

>>> class Placemark(object):
...     __geo_interface__ = {
...         'type': 'Feature',
...         'properties': {'name': 'Phoo'},
...         'geometry': Pointy.__geo_interface__ }
>>> from my_analytic_module import as_feature
>>> as_feature(Placemark())['properties']['name']
'Phoo'

Implementations

Python programs and packages that you have heard of – and made be a frequent user of – already implement this protocol:

Shapely

Shapely [7] provides a shape() function that makes Shapely geometries from objects that provide __geo_interface__ and a mapping() function that writes geometries out as dictionaries:

>>> from shapely.geometry import Point
>>> from shapely.geometry import mapping, shape
>>> Point(0.0, 0.0).__geo_interface__
{'type': 'Point', 'coordinates': (0.0, 0.0)}
>>> shape(Point(0.0, 0.0))
<shapely.geometry.point.Point object at 0x...>
>>> mapping(Point(0.0, 0.0))
{'type': 'Point', 'coordinates': (0.0, 0.0)}

The Shapely version of the example in the introduction is:

>>> from shapely.geometry import shape
>>> shape(obj).buffer(1.0).area
3.1365484905459389

where obj could be a geometry object from ArcPy or PySAL, or even a mapping directly:

>>> shape({'type': 'Point', 'coordinates': (0.0, 0.0)}).buffer(1.0).area
3.1365484905459389

References

[1]http://docs.scipy.org/doc/numpy/reference/arrays.interface.html
[2]https://tools.ietf.org/html/rfc7946
[3]https://desktop.arcgis.com/en/arcmap/latest/analyze/arcpy-functions/asshape.htm
[4]https://bitbucket.org/sgillies/descartes/src/f97e54f3b8d4/descartes/patch.py#cl-14
[5]http://pypi.python.org/pypi/geojson/
[6]https://pysal.readthedocs.io/en/latest/users/tutorials/shapely.html
[7]https://github.com/Toblerity/Shapely
@micahcochran
Copy link

@bixb0012

From the spec text:

coordinates (required)
Valid only for geometry types. This is an (x, y) or (longitude, latitude) tuple in the case of a "Point", a list of such tuples in the "LineString" case, or a list of lists in the "Polygon" case. See the GeoJSON spec for details.

This reads as if you should use tuples for a coordinates only and lists for any more complicated geometry. So the first example seems like it is correct:

{'type': 'MultiLineString', 'coordinates': [[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]]}

Implementations are a whole different matter.
python-geojson seems to use lists all the way down:

>>> import geojson
In [5]: geojson.MultiLineString(coordinates=[[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]])

Out[5]: {"coordinates": [[[35.0, 35.0], [45.0, 45.0]], [[5.0, 15.0], [15.0, 25.0]]], "type": "MultiLineString"}

pygeoif uses tuples all the way down.

>>> import pygeoif
In [12]: g=pygeoif.MultiLineString([[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]])
In [15]: g.__geo_interface__
Out[15]: {'coordinates': (((35.0, 35.0), (45.0, 45.0)), ((5.0, 15.0), (15.0, 25.0))), 'type': 'MultiLineString'}

@jmoujaes
Copy link

jmoujaes commented Sep 8, 2018

@sgillies thanks for this work!

Are coordinates purposely represented as tuples or should they be lists? There has been discussion above for both cases.
The RFC (as @aronbierbaum mentioned) says arrays, but as @aolieman mentioned above, it could be tuples.

Thanks!

@kannes
Copy link

kannes commented Nov 25, 2018

QGIS supports it as well!

from shapely.geometry import shape, mapping
layer = iface.activeLayer()
feature = layer.selectedFeatures()[0]
shape(feature.__geo_interface__['geometry'])
<shapely.geometry.point.Point object at 0x7f21dc194208>
shape(feature.geometry().__geo_interface__)
<shapely.geometry.point.Point object at 0x7f21dc39e780>

@shankari
Copy link

@sgillies although your examples include Feature and Feature is a geojson supported type, it doesn't look like shapely currently supports it.

loc = {
        "type": "Feature",
        "properties": {
            "name": "Mountain View Library"
        },
        "geometry": {
            "type": "Polygon",
            "coordinates": [
                [
                    [ -122.08355963230133, 37.39091642895306 ],
                    [ -122.08428382873535, 37.38975713188671 ],
                    [ -122.08383858203888, 37.389573859018185 ],
                    [ -122.08340406417847, 37.390196132517175 ],
                    [ -122.08311975002289, 37.39012793841312 ],
                    [ -122.08280861377716, 37.390656441096894 ],
                    [ -122.08355426788331, 37.39096757399895 ],
                    [ -122.08355963230133, 37.39091642895306 ]
                  ]
                ]
            }
        }

geo.asShape(loc)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-024f8471fc95> in <module>()
     21         }
     22 
---> 23 geo.asShape(loc)

/Users/shankari/OSS/anaconda/envs/emissioneval/lib/python3.6/site-packages/shapely/geometry/geo.py in asShape(context)
     76         return GeometryCollection(geoms)
     77     else:
---> 78         raise ValueError("Unknown geometry type: %s" % geom_type)
     79 
     80 def mapping(ob):

ValueError: Unknown geometry type: feature

Of course, I can work around this by copying the coordinates from the geojson object to shapely but that (sort of) defeats the purpose of asShape

@kannes
Copy link

kannes commented Sep 5, 2019

@shankari, please file that at https://github.com/Toblerity/Shapely/issues

@perrygeo, could you please document GeoPandas' approach here? Then others who want to provide a geo interface for a whole set of features will hopefully use the same approach. I am not sure if the spec was intended to support collections but it seems reasonable.

@mapadofu
Copy link

I've got a couple of questions on the design here:

  • Why use dunders? couldn't one just have said that the interface is geo_interface instead of __geo_interface__
  • Why an attribute rather than a function? i.e. __geo_interface__ vs. __geo_interface__()?

I'm looking at doing a similar thing in a different context and want to understand the potential tradeoffs better.

@perrygeo
Copy link

The dunders are useful to indicate that this interface is generic (ie this interface is well-known and can be implemented by any class) and that it is private (used internally to your library or application code). Using geo_interface without dunder, there would be no way to know if the method was implementing this interface or if it was a similarly named method with different behavior. As with most things Python, this is just a naming convention and the visibility rules are implied not enforced.

Not sure about attribute vs function - arguably a function would be more flexible as it could accept options as kwargs. But in this case, there were no options to expose to the user.

@ChrisBarker-NOAA
Copy link

One "trick" is that dunders are a namespace defined by Python itself -- i.e. used to define the "official" Python interfaces.

But there's only so many namespaces -- so "grabbing" __geo for the geospatial world is reasonable enough.

And there is precedent -- __array has been used by numpy for ages, and it's not an official python dunder.

There is no official "geospatial_in_Python" group that I know of to define this -- but looking at who's contributed to this discussion, it is kinda the unofficial group :-)

This gist was started a long time ago -- is it published anywhere? is there a place to publish it? Something a little more official looking than a gist :-)

@ChrisBarker-NOAA
Copy link

BTW: with regard to tuples vs lists:

Python has a very flexible type model -- so I'd think the way to go would be to say "sequence". I know I 'like to use Nx2 numpy arrays as a list of coordinates.

Or be a bit more confining and say "anything that you can pass into json.dump and get geoJSON out.

@eric-spitler
Copy link

@sgillies GeoPandas also uses the __geo_interface__ when loading Features into a GeoDataFrame
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.from_features.html

@NoharaMasato
Copy link

Hello @sgillies, It looks like links for [4] and [6] in references are outdated.

@sthagen
Copy link

sthagen commented Apr 10, 2023

Hello @sgillies, It looks like links for [4] and [6] in references are outdated.

I would suggest to replace with as helper for random visitors (but Sean will know better replacements):

[4]: Descartes ... but then the source link at PyPI - a fork seems to be benjimin/descartes

[6]: Shapely tutorial per wayback machine snapshot of 2019-Jun-19 ... or try to map to whatever PySAL.org now offers ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment