Intro:
I1. URL specify a location, not content.
I2. You can't "download" a DOI
I3. content hashes are unique content identifiers for data
I4. idea: use content ids in analysis, register known locations of associated content in registries
Use case:
Reliably Counting Canadian Geese (the most prominent bird near Lake Merritt)
for (pseudo-)code see: https://gist.github.com/jhpoelen/19aba7c7c57d6da217ca644dc7634c02#file-count_geese-r
U1. discover eBird datasets (oops, they are all gone one behind some registration-wall)
U2. use a time machine (preston) to recover, republish and register eBird 2018/2019 pubs (see versions http://doi.org/10.5281/zenodo.3858250 ) using content::registry(...)
U3. develop your method (function count_geese)
U4. reproduce count using content_ids + contentid::resolve(...)
Conclusions:
C1. future proof your scripts by using content-based identifiers in your scripts instead of urls of local paths
C2. register locations of content ids anytime and anywhere (with or without time machine, suitable for embargoed/private data)
C3. use content ids!
@cboettig Please see email and this outline for talk.