Skip to content

Instantly share code, notes, and snippets.

@sgillies
Last active October 30, 2024 10:58
Show Gist options
  • Save sgillies/2217756 to your computer and use it in GitHub Desktop.
Save sgillies/2217756 to your computer and use it in GitHub Desktop.
A Python Protocol for Geospatial Data

Author: Sean Gillies Version: 1.0

Abstract

This document describes a GeoJSON-like protocol for geo-spatial (GIS) vector data.

Introduction

Python has a number of built-in protocols (descriptors, iterators, etc). A very simple and familiar one involves string representations of objects. The built-in str() function calls the __str__() method of its single argument. By implementing __str__(), instances of any class can be printed by any other Python program.

>>> class A(object):
...     def __str__(self):
...         return "Eh!"
...
>>> a = A()
>>> str(a)
'Eh!'
>>> "%s" % a
'Eh!'

What if we could do something like this for geo-spatial objects? It might, for example, let any object be analyzed using any other hypothetical software package like this:

>>> from some_analytic_module import as_geometry
>>> as_geometry(obj).buffer(1.0).area   # obj is a "point" of some kind
3.1365484905459389

The hypothetical as_geometry() function of the hypothetical some_analytic_module module would access relevant data of its single argument using an agreed upon method or attribute.

__geo_interface__

Following the lead of numpy's Array Interface [1], let's agree on a __geo_interface__ property. To avoid creating even more protocols, let's make the value of this attribute a Python mapping. To further minimize invention, let's borrow from the GeoJSON format [2] for the structure of this mapping.

The keys are:

type (required)
A string indicating the geospatial type. Possible values are "Feature" or a geometry type: "Point", "LineString", "Polygon", etc.
bbox (optional)
A tuple of floats that describes the geo-spatial bounds of the object: (left, bottom, right, top) or (west, south, east, north).
properties (optional)
A mapping of feature properties (labels, populations ... you name it. Dependent on the data). Valid for "Feature" types only.
geometry (optional)
The geometric object of a "Feature" type, also as a mapping.
coordinates (required)
Valid only for geometry types. This is an (x, y) or (longitude, latitude) tuple in the case of a "Point", a list of such tuples in the "LineString" case, or a list of lists in the "Polygon" case. See the GeoJSON spec for details.

Examples

First, a toy class with a point representation:

>>> class Pointy(object):
...     __geo_interface__ = {'type': 'Point', 'coordinates': (0.0, 0.0)}
...
>>> as_geometry(Pointy()).buffer(1.0).area
3.1365484905459389

Next, a toy class with a feature representation:

>>> class Placemark(object):
...     __geo_interface__ = {
...         'type': 'Feature',
...         'properties': {'name': 'Phoo'},
...         'geometry': Pointy.__geo_interface__ }
>>> from my_analytic_module import as_feature
>>> as_feature(Placemark())['properties']['name']
'Phoo'

Implementations

Python programs and packages that you have heard of – and made be a frequent user of – already implement this protocol:

Shapely

Shapely [7] provides a shape() function that makes Shapely geometries from objects that provide __geo_interface__ and a mapping() function that writes geometries out as dictionaries:

>>> from shapely.geometry import Point
>>> from shapely.geometry import mapping, shape
>>> Point(0.0, 0.0).__geo_interface__
{'type': 'Point', 'coordinates': (0.0, 0.0)}
>>> shape(Point(0.0, 0.0))
<shapely.geometry.point.Point object at 0x...>
>>> mapping(Point(0.0, 0.0))
{'type': 'Point', 'coordinates': (0.0, 0.0)}

The Shapely version of the example in the introduction is:

>>> from shapely.geometry import shape
>>> shape(obj).buffer(1.0).area
3.1365484905459389

where obj could be a geometry object from ArcPy or PySAL, or even a mapping directly:

>>> shape({'type': 'Point', 'coordinates': (0.0, 0.0)}).buffer(1.0).area
3.1365484905459389

References

[1]http://docs.scipy.org/doc/numpy/reference/arrays.interface.html
[2]https://tools.ietf.org/html/rfc7946
[3]https://desktop.arcgis.com/en/arcmap/latest/analyze/arcpy-functions/asshape.htm
[4]https://bitbucket.org/sgillies/descartes/src/f97e54f3b8d4/descartes/patch.py#cl-14
[5]http://pypi.python.org/pypi/geojson/
[6]https://pysal.readthedocs.io/en/latest/users/tutorials/shapely.html
[7]https://github.com/Toblerity/Shapely
@mapadofu
Copy link

I've got a couple of questions on the design here:

  • Why use dunders? couldn't one just have said that the interface is geo_interface instead of __geo_interface__
  • Why an attribute rather than a function? i.e. __geo_interface__ vs. __geo_interface__()?

I'm looking at doing a similar thing in a different context and want to understand the potential tradeoffs better.

@perrygeo
Copy link

The dunders are useful to indicate that this interface is generic (ie this interface is well-known and can be implemented by any class) and that it is private (used internally to your library or application code). Using geo_interface without dunder, there would be no way to know if the method was implementing this interface or if it was a similarly named method with different behavior. As with most things Python, this is just a naming convention and the visibility rules are implied not enforced.

Not sure about attribute vs function - arguably a function would be more flexible as it could accept options as kwargs. But in this case, there were no options to expose to the user.

@ChrisBarker-NOAA
Copy link

One "trick" is that dunders are a namespace defined by Python itself -- i.e. used to define the "official" Python interfaces.

But there's only so many namespaces -- so "grabbing" __geo for the geospatial world is reasonable enough.

And there is precedent -- __array has been used by numpy for ages, and it's not an official python dunder.

There is no official "geospatial_in_Python" group that I know of to define this -- but looking at who's contributed to this discussion, it is kinda the unofficial group :-)

This gist was started a long time ago -- is it published anywhere? is there a place to publish it? Something a little more official looking than a gist :-)

@ChrisBarker-NOAA
Copy link

BTW: with regard to tuples vs lists:

Python has a very flexible type model -- so I'd think the way to go would be to say "sequence". I know I 'like to use Nx2 numpy arrays as a list of coordinates.

Or be a bit more confining and say "anything that you can pass into json.dump and get geoJSON out.

@eric-spitler
Copy link

@sgillies GeoPandas also uses the __geo_interface__ when loading Features into a GeoDataFrame
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.from_features.html

@NoharaMasato
Copy link

Hello @sgillies, It looks like links for [4] and [6] in references are outdated.

@sthagen
Copy link

sthagen commented Apr 10, 2023

Hello @sgillies, It looks like links for [4] and [6] in references are outdated.

I would suggest to replace with as helper for random visitors (but Sean will know better replacements):

[4]: Descartes ... but then the source link at PyPI - a fork seems to be benjimin/descartes

[6]: Shapely tutorial per wayback machine snapshot of 2019-Jun-19 ... or try to map to whatever PySAL.org now offers ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment