Geospatial privacy initial cheatsheet
There's more to it than this but this is a decent starting point.
- If there are personal identifiers in the dataset, the safest approach is to remove them entirely. If you need to keep them in to enable analytics on an 'anonymous user over time', make sure that you don't use a reversible technique like MD5 hashing. See the NYC Taxicab debacle for an example. 2. If you use hashes, use a long salt value and a cryptographically okay hash like SHA512. 3. Or randomize data order and assign serial (increasing integer) numbers to identifiers. 4. But really just removing any kind of identifiable ID is better than trying to obscure it.
- Set lower bounds for aggregation: if someone filters an API down to a single record, you may want to return nothing. There's the case where someone's able to craft a well-filtered query and just see one user’s data. See