Skip to content

Instantly share code, notes, and snippets.

@caged
Last active August 29, 2015 14:19
Show Gist options
  • Save caged/81ed9c994200286dc317 to your computer and use it in GitHub Desktop.
Save caged/81ed9c994200286dc317 to your computer and use it in GitHub Desktop.
Quick feedback for municipalities working with open data
  • Datasets first, APIs second - Doing any kind of aggregate analysis usually requires working with complete datasets. REST APIs aren't ideal for this use case. APIs are not data, they are a means of exposing it.
  • Machine-friendly retrieval of raw datasets - Avoid the assumption that there's a human, using a web browser, manually clicking a link. For example, scripts that fetch new daily crime data via curl would be a likely scenario. Make it easy for machines by removing authentication, unnecessary redirects, JavaScript-based retrieval or POST-style retrieval.
  • Document long column names - Shapefile attributes are limited to 10 characters. This makes many attributes difficult to decipher without associated metadata. For example, here are a few attributes from a Garbage Collection dataset. Include a file with the long column name mappings and include both the long and short name in the metadata.
  • Metadata as a first-class entity - Metadata is just as important as the dataset itself. It's the documentation, the contact information and the way you find data if it's indexable by search engines. Include a version with each dataset download (in addition to online), so it's frozen to the point in time when the data was retrieved.
  • Link dataset from metadata - Search engines will likely index online metadata if it's allowed. Make it easy to go from metadata to dataset and from dataset to metadata.
  • Use a public issue tracker or mailing list - Allow developers to report issues with data integrity, documentation, etc. Be responsive to pain points reported by developers.
  • Avoid moving public datasets - If unavoidable, redirect previous dataset to new dataset location as to not break existing scripts and applications. Cool URLs don't change.
  • Open communities don't need a CMS to thrive - A full-featured CMS isn't a prerequisite for a vibrant local community. A public mailing list with engaged civic employees is a powerful foundation.
  • As few pages as possible - Be mindful about how often you paginate. It's perfectly fine to have 50-100 datasets on a single page. We're conditioned to scroll webpages and it's much faster to get an idea of what's available vs. clicking through many different pages and waiting for each one to load.
@ungoldman
Copy link

I agree with everything above. The only things I might add:

Open formats - It's great to provide datasets for download in multiple non-proprietary formats when possible. For example if you have a dataset that is in Shapefile format, it's nice to provide the download as CSV, GeoJSON, and KML as well.

No login - Unless a particular dataset requires registration for safety or legal reasons (e.g. contains sensitive or personally identifiable information), there's no reason to hide the download behind a login. If it's fully public open data it should be downloadable by anonymous users.

I'd like to especially emphasize @caged's point on metadata -- if the metadata is missing or difficult to parse (at minimum human-readable title, description of dataset subject, ISO 8601 standard created/modified dates), it's very hard to identify the right dataset to work with. https://schema.org/Dataset has an exhaustive list of useful properties to include. I think @maxogden @Karissa and @mafintosh are looking at using a subset of schema.org's Dataset properties for dat metadata.

@max-mapper
Copy link

+1 to everything!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment