Skip to content

Instantly share code, notes, and snippets.

@MHenderson
Last active December 21, 2015 08:39
Show Gist options
  • Save MHenderson/6279740 to your computer and use it in GitHub Desktop.
Save MHenderson/6279740 to your computer and use it in GitHub Desktop.
"Processing GPS data in Python" notebook for IPython. http://nbviewer.ipython.org/6279740
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Processing GPS data in Python\n",
"\n",
"Last updated: Thu Nov 21 22:57:12 GMT 2013.\n",
"\n",
"Source code for this notebook is at: [https://gist.github.com/MHenderson/6279740](https://gist.github.com/MHenderson/6279740)\n",
"\n",
"The notebook itself can be viewed at: [http://nbviewer.ipython.org/6279740](http://nbviewer.ipython.org/6279740)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing distances between points\n",
"\n",
"Using ``pyproj`` (http://code.google.com/p/pyproj/) we implement three functions for convenience. \n",
"\n",
"* ``distance(lat1, lng1, lat2, lng2)`` computes the distance between points with latitudes `lat1`, `lat2` and longitudes `lng1`, `lng2` with respect to the WGS84 ellipsoid (http://en.wikipedia.org/wiki/World_Geodetic_System)\n",
"* ``distance_between(p1, p2)`` computes the same distance but when the arguments `p1`, `p2` are latitude, longitude pairs.\n",
"* ``nearest_mile(distance_in_metres)`` converts a distance in metres to a distance to the nearest mile.\n",
"* ``total_distance(points)`` calculates the total distance between points in a list of points."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pyproj\n",
"\n",
"def distance(lat1, lng1, lat2, lng2, ellps = 'WGS84'):\n",
" g = pyproj.Geod(ellps = ellps)\n",
" return g.inv(lng1, lat1, lng2, lat2)[2]\n",
"\n",
"def distance_between(p1, p2):\n",
" return distance(p1[0], p1[1], p2[0], p2[1])\n",
"\n",
"def nearest_mile(distance_in_metres):\n",
" return int(0.621371*distance_in_metres/1000)\n",
"\n",
"def total_distance(points):\n",
" return sum(map(distance_between, points[:-1], points[1:]))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So now, for example, if we know that Nottingham, England has latitude and longitude (52.9548, -1.1581) and Louisville, Kentucky has latitude and longitude (38.253284, -85.758786) then we can compute the great circle distance between those two points by using the ``distance_between`` function."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"p1 = (52.9548, -1.1581) # Nottingham, England\n",
"p2 = (38.253284, -85.758786) # Louisville, KY\n",
"print \"Distance (to the nearest mile): \" + str(nearest_mile(distance_between(p1, p2)))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Distance (to the nearest mile): 3976\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Working with CSV files\n",
"\n",
"The data we are given is in CSV format (http://en.wikipedia.org/wiki/Comma-separated_values). Each row of our data gives GPS (in the columns headed ``latitude`` and ``longitude``) data for a specific van (``van_id``) at a specific time (``timestamp``). We also have access to other information like the ``address``, ``speed``, ``heading`` and so forth. To open a CSV file for inspection with Python we use the standard library module ``csv`` which provides the ``DictReader`` object which provides a dictionary interface to the CSV data. To instantiate a DictReader we need to provide the path the CSV file and a list of table headings."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data_dir_path = '/home/matthew/workspace/resources/G/Geographical Information Science/'\n",
"van_activity_csv_filename = 'van_activity.csv'\n",
"van_activity_csv_filename = 'gps-activity.csv'\n",
"van_activity_csv_path = data_dir_path + van_activity_csv_filename\n",
"labels = ['id','van_id','timestamp','latitude','longitude','type','address','speed','heading','created']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With this information we can create our ``DictReader`` object:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import csv\n",
"\n",
"csv_file = open(van_activity_csv_path, 'rb')\n",
"van_activity_reader = csv.DictReader(csv_file, labels, delimiter=',', quotechar='\\\"')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The keyword arguments ``delimeter`` and ``quotechar`` can be customised, for example to allow for tab seperated values.\n",
"\n",
"We immediately advance the ``van_activity_reader`` to the next value because the first row represent the headings and so we don't want to do any calculation with that data. After that we build a list of points by iterating over the remaining rows of the data."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"van_activity_reader.next()\n",
"points = []\n",
"for van_activity in van_activity_reader:\n",
" points.append((van_activity['latitude'], van_activity['longitude']))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(points)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"143974"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ultimate task is to inspect the data for anomalies. The vans should be following the same routes on different days and, therefore, should follow certain routes day after day and return more or less the same data every day. We want to look for features in the data that will allow us to recognise automatically whether a van's activity is anomalous. To start with, we look at the total distance travelled by a given van on a given day."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print \"Distance (to the nearest mile): \" + str(nearest_mile(total_distance(points)))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Distance (to the nearest mile): 937658\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"* http://blog.tremily.us/posts/pyproj/\n"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment