Skip to content

Instantly share code, notes, and snippets.

@sgillies
Last active October 30, 2024 10:58
Show Gist options
  • Save sgillies/2217756 to your computer and use it in GitHub Desktop.
Save sgillies/2217756 to your computer and use it in GitHub Desktop.
A Python Protocol for Geospatial Data

Author: Sean Gillies Version: 1.0

Abstract

This document describes a GeoJSON-like protocol for geo-spatial (GIS) vector data.

Introduction

Python has a number of built-in protocols (descriptors, iterators, etc). A very simple and familiar one involves string representations of objects. The built-in str() function calls the __str__() method of its single argument. By implementing __str__(), instances of any class can be printed by any other Python program.

>>> class A(object):
...     def __str__(self):
...         return "Eh!"
...
>>> a = A()
>>> str(a)
'Eh!'
>>> "%s" % a
'Eh!'

What if we could do something like this for geo-spatial objects? It might, for example, let any object be analyzed using any other hypothetical software package like this:

>>> from some_analytic_module import as_geometry
>>> as_geometry(obj).buffer(1.0).area   # obj is a "point" of some kind
3.1365484905459389

The hypothetical as_geometry() function of the hypothetical some_analytic_module module would access relevant data of its single argument using an agreed upon method or attribute.

__geo_interface__

Following the lead of numpy's Array Interface [1], let's agree on a __geo_interface__ property. To avoid creating even more protocols, let's make the value of this attribute a Python mapping. To further minimize invention, let's borrow from the GeoJSON format [2] for the structure of this mapping.

The keys are:

type (required)
A string indicating the geospatial type. Possible values are "Feature" or a geometry type: "Point", "LineString", "Polygon", etc.
bbox (optional)
A tuple of floats that describes the geo-spatial bounds of the object: (left, bottom, right, top) or (west, south, east, north).
properties (optional)
A mapping of feature properties (labels, populations ... you name it. Dependent on the data). Valid for "Feature" types only.
geometry (optional)
The geometric object of a "Feature" type, also as a mapping.
coordinates (required)
Valid only for geometry types. This is an (x, y) or (longitude, latitude) tuple in the case of a "Point", a list of such tuples in the "LineString" case, or a list of lists in the "Polygon" case. See the GeoJSON spec for details.

Examples

First, a toy class with a point representation:

>>> class Pointy(object):
...     __geo_interface__ = {'type': 'Point', 'coordinates': (0.0, 0.0)}
...
>>> as_geometry(Pointy()).buffer(1.0).area
3.1365484905459389

Next, a toy class with a feature representation:

>>> class Placemark(object):
...     __geo_interface__ = {
...         'type': 'Feature',
...         'properties': {'name': 'Phoo'},
...         'geometry': Pointy.__geo_interface__ }
>>> from my_analytic_module import as_feature
>>> as_feature(Placemark())['properties']['name']
'Phoo'

Implementations

Python programs and packages that you have heard of – and made be a frequent user of – already implement this protocol:

Shapely

Shapely [7] provides a shape() function that makes Shapely geometries from objects that provide __geo_interface__ and a mapping() function that writes geometries out as dictionaries:

>>> from shapely.geometry import Point
>>> from shapely.geometry import mapping, shape
>>> Point(0.0, 0.0).__geo_interface__
{'type': 'Point', 'coordinates': (0.0, 0.0)}
>>> shape(Point(0.0, 0.0))
<shapely.geometry.point.Point object at 0x...>
>>> mapping(Point(0.0, 0.0))
{'type': 'Point', 'coordinates': (0.0, 0.0)}

The Shapely version of the example in the introduction is:

>>> from shapely.geometry import shape
>>> shape(obj).buffer(1.0).area
3.1365484905459389

where obj could be a geometry object from ArcPy or PySAL, or even a mapping directly:

>>> shape({'type': 'Point', 'coordinates': (0.0, 0.0)}).buffer(1.0).area
3.1365484905459389

References

[1]http://docs.scipy.org/doc/numpy/reference/arrays.interface.html
[2]https://tools.ietf.org/html/rfc7946
[3]https://desktop.arcgis.com/en/arcmap/latest/analyze/arcpy-functions/asshape.htm
[4]https://bitbucket.org/sgillies/descartes/src/f97e54f3b8d4/descartes/patch.py#cl-14
[5]http://pypi.python.org/pypi/geojson/
[6]https://pysal.readthedocs.io/en/latest/users/tutorials/shapely.html
[7]https://github.com/Toblerity/Shapely
@aolieman
Copy link

aolieman commented May 8, 2015

@aronbierbaum, I don't think so. Coordinate pairs don't benefit from being stored in a mutable python type, and a tuple is an efficient choice for what we want to represent here. JSON is a serialization format, and as such is inherently immutable.

I guess what you've overlooked here is that __geo_interface__ specifies an interface, not a serialization format. The geojson package provides a way to serialize __geo_interface__ values to GeoJSON (see encoding/decoding).

@micahcochran
Copy link

@jzmiller1 I think that __geo_interface__ should have an optional crs key. However, specifying the format could be a little problematic. I have some ideas on how to do it, but I feel that a post might be a little too limited. I am creating a document to explain my ideas of adding crs to the __geo_interface__.

@cleder
Copy link

cleder commented May 4, 2016

https://github.com/fortyninemaps/karta also implements the __geo_interface__.

@bixb0012
Copy link

bixb0012 commented Jan 9, 2017

I think it would be good to include some additional examples that clarify how tuples and lists should be handled in the output. Although there may not be a difference in terms of processing the output, there is a difference in terms of appearance, and there seems to be some debate as to which is the "better" way to go.

For example, should MULTILINESTRING((35 35, 45 45), (5 15, 15 25)) output look like

{'type': 'MultiLineString', 'coordinates': [[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]]}

or

{'type': 'MultiLineString', 'coordinates': (((35.0, 35.0), (45.0, 45.0)), ((5.0, 15.0), (15.0, 25.0)))}

@MRigal
Copy link

MRigal commented Jan 10, 2017

Any known minimal adapter of the geo_interface for psycopg2 to avoid using the Python2-constricted ppygis or the heavier ogr or shapely?

@micahcochran
Copy link

@bixb0012

From the spec text:

coordinates (required)
Valid only for geometry types. This is an (x, y) or (longitude, latitude) tuple in the case of a "Point", a list of such tuples in the "LineString" case, or a list of lists in the "Polygon" case. See the GeoJSON spec for details.

This reads as if you should use tuples for a coordinates only and lists for any more complicated geometry. So the first example seems like it is correct:

{'type': 'MultiLineString', 'coordinates': [[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]]}

Implementations are a whole different matter.
python-geojson seems to use lists all the way down:

>>> import geojson
In [5]: geojson.MultiLineString(coordinates=[[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]])

Out[5]: {"coordinates": [[[35.0, 35.0], [45.0, 45.0]], [[5.0, 15.0], [15.0, 25.0]]], "type": "MultiLineString"}

pygeoif uses tuples all the way down.

>>> import pygeoif
In [12]: g=pygeoif.MultiLineString([[(35.0, 35.0), (45.0, 45.0)], [(5.0, 15.0), (15.0, 25.0)]])
In [15]: g.__geo_interface__
Out[15]: {'coordinates': (((35.0, 35.0), (45.0, 45.0)), ((5.0, 15.0), (15.0, 25.0))), 'type': 'MultiLineString'}

@jmoujaes
Copy link

jmoujaes commented Sep 8, 2018

@sgillies thanks for this work!

Are coordinates purposely represented as tuples or should they be lists? There has been discussion above for both cases.
The RFC (as @aronbierbaum mentioned) says arrays, but as @aolieman mentioned above, it could be tuples.

Thanks!

@kannes
Copy link

kannes commented Nov 25, 2018

QGIS supports it as well!

from shapely.geometry import shape, mapping
layer = iface.activeLayer()
feature = layer.selectedFeatures()[0]
shape(feature.__geo_interface__['geometry'])
<shapely.geometry.point.Point object at 0x7f21dc194208>
shape(feature.geometry().__geo_interface__)
<shapely.geometry.point.Point object at 0x7f21dc39e780>

@shankari
Copy link

@sgillies although your examples include Feature and Feature is a geojson supported type, it doesn't look like shapely currently supports it.

loc = {
        "type": "Feature",
        "properties": {
            "name": "Mountain View Library"
        },
        "geometry": {
            "type": "Polygon",
            "coordinates": [
                [
                    [ -122.08355963230133, 37.39091642895306 ],
                    [ -122.08428382873535, 37.38975713188671 ],
                    [ -122.08383858203888, 37.389573859018185 ],
                    [ -122.08340406417847, 37.390196132517175 ],
                    [ -122.08311975002289, 37.39012793841312 ],
                    [ -122.08280861377716, 37.390656441096894 ],
                    [ -122.08355426788331, 37.39096757399895 ],
                    [ -122.08355963230133, 37.39091642895306 ]
                  ]
                ]
            }
        }

geo.asShape(loc)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-024f8471fc95> in <module>()
     21         }
     22 
---> 23 geo.asShape(loc)

/Users/shankari/OSS/anaconda/envs/emissioneval/lib/python3.6/site-packages/shapely/geometry/geo.py in asShape(context)
     76         return GeometryCollection(geoms)
     77     else:
---> 78         raise ValueError("Unknown geometry type: %s" % geom_type)
     79 
     80 def mapping(ob):

ValueError: Unknown geometry type: feature

Of course, I can work around this by copying the coordinates from the geojson object to shapely but that (sort of) defeats the purpose of asShape

@kannes
Copy link

kannes commented Sep 5, 2019

@shankari, please file that at https://github.com/Toblerity/Shapely/issues

@perrygeo, could you please document GeoPandas' approach here? Then others who want to provide a geo interface for a whole set of features will hopefully use the same approach. I am not sure if the spec was intended to support collections but it seems reasonable.

@mapadofu
Copy link

I've got a couple of questions on the design here:

  • Why use dunders? couldn't one just have said that the interface is geo_interface instead of __geo_interface__
  • Why an attribute rather than a function? i.e. __geo_interface__ vs. __geo_interface__()?

I'm looking at doing a similar thing in a different context and want to understand the potential tradeoffs better.

@perrygeo
Copy link

The dunders are useful to indicate that this interface is generic (ie this interface is well-known and can be implemented by any class) and that it is private (used internally to your library or application code). Using geo_interface without dunder, there would be no way to know if the method was implementing this interface or if it was a similarly named method with different behavior. As with most things Python, this is just a naming convention and the visibility rules are implied not enforced.

Not sure about attribute vs function - arguably a function would be more flexible as it could accept options as kwargs. But in this case, there were no options to expose to the user.

@ChrisBarker-NOAA
Copy link

One "trick" is that dunders are a namespace defined by Python itself -- i.e. used to define the "official" Python interfaces.

But there's only so many namespaces -- so "grabbing" __geo for the geospatial world is reasonable enough.

And there is precedent -- __array has been used by numpy for ages, and it's not an official python dunder.

There is no official "geospatial_in_Python" group that I know of to define this -- but looking at who's contributed to this discussion, it is kinda the unofficial group :-)

This gist was started a long time ago -- is it published anywhere? is there a place to publish it? Something a little more official looking than a gist :-)

@ChrisBarker-NOAA
Copy link

BTW: with regard to tuples vs lists:

Python has a very flexible type model -- so I'd think the way to go would be to say "sequence". I know I 'like to use Nx2 numpy arrays as a list of coordinates.

Or be a bit more confining and say "anything that you can pass into json.dump and get geoJSON out.

@eric-spitler
Copy link

@sgillies GeoPandas also uses the __geo_interface__ when loading Features into a GeoDataFrame
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.from_features.html

@NoharaMasato
Copy link

Hello @sgillies, It looks like links for [4] and [6] in references are outdated.

@sthagen
Copy link

sthagen commented Apr 10, 2023

Hello @sgillies, It looks like links for [4] and [6] in references are outdated.

I would suggest to replace with as helper for random visitors (but Sean will know better replacements):

[4]: Descartes ... but then the source link at PyPI - a fork seems to be benjimin/descartes

[6]: Shapely tutorial per wayback machine snapshot of 2019-Jun-19 ... or try to map to whatever PySAL.org now offers ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment