Skip to content

Instantly share code, notes, and snippets.

@max-mapper
Forked from boazsender/open-civic-data.json
Created February 6, 2011 04:59
Show Gist options
  • Save max-mapper/813141 to your computer and use it in GitHub Desktop.
Save max-mapper/813141 to your computer and use it in GitHub Desktop.
{
title: "title of data", // eg: 'the xyz murder', 'blizzard of 2011', 'chelmsburry bike path', 'MBTA green line', 'bus stop schedule event' etc
type: "type of data", // eg: event, physical feature
set: "what data set data is from if any", // if data is from national data set e.g. 2000 census
category: [category, category], // develop this taxonomy
description: "description of data", // text description of data
start: RFC3339DateTime, // optional start date
end: RFC3339DateTime, // optional end date
geometry : {}// optional geojson location feature (http://geojson.org/geojson-spec.html)
}
}
@boazsender
Copy link

  • -1 to id being a hash of all that, let's use built in couch ids, no?
  • +1 to use of RFC3339DateTime over epoch time
  • +1 to the geometry property, and use of the geojson spec.

@max-mapper
Copy link
Author

well the ID is deterministic this way... not random. One question i've struggled with is how to reconcile two copies of the same dataset.

Say we both download a dataset consisting of 10 fire hydrants. I add one, you add one, then we both upload our new sets of 11. There are 12 unique fire hydrants in our little universe at this point but we need a way of de-duplicating the overlapping 10. By setting the id as a hash of the location, timestamp and category we are effectively saying that if a fire hydrant moves location it is now a new unique fire hydrant.

Say for instance that we are actually not adding a new fire hydrant but instead updating a hydrant's location to be more accurate. In this case we would want to reconcile our new hydrant object as a new revision of the old instance of the hydrant and not as a completely new hydrant.

Ahha! Actually at this point I suppose we don't need a deterministic ID but instead we could use Couch's built in random UUID as you stated. The key is actually the reconciliation API and making sure that we can add revisions to existing objects in certain cases and create new objects in others.

/typing out loud

@boazsender
Copy link

This is a really good point. I suppose that hashing IDs to represent the data is a good way for preventing conflicts. I would think, however, that our validators would be have to evaluate the data in a more holistic way.

While not wildly complicated, this International Open Civic Data Service will have several complex points, the most of which I believe is this one that you raise; dealing with and merging duplicates. Consider for example, two very similar datum from two different data sets. What is the desired behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment