Here is an example Amazon Workdocs doc link:
https://amazon.awsapps.com/workdocs-preview/index.html#/document/521878139568b6470762d27a1c4b2a5f6493738685c74ab02c0661ba2810ed64
This is the ID: 521878139568b6470762d27a1c4b2a5f6493738685c74ab02c0661ba2810ed64
Boy do they use a long unique identifier.
You’d think they were trying to index a galaxy of nearly infinite docs.
the url is using hex digits (base-16) and not base-64, which is less efficient character count wise … still
that doc id is 64 chars of base-16, so that’s 16^64 or 1.16e77
that’s 1 followed by 77 zeros
yikes
a standard GUID is 128 bits or 2^128
that ID is 2^256
2^129 would be twice the size of a standard GUID
so 2^256 is insanely large
dang I’d like to see their white paper justification for that
They could be encoding numerous pieces of data in there, but more likely it's simple a DB lookup of the ID to all the metadata. Or due to multiple corporate accounts and AWS regions some of that could be in the ID to route to the right DB instance. Still that's a really big ID.
It could be two GUIDs just stacked next to each other (128 bits each), like
521878139568b6470762d27a1c4b2a5f
and 6493738685c74ab02c0661ba2810ed64
My first guess is it's a GUID for an account and a GUID for the file in the account. Though each account has a unique subdomain, so IDK about that.
For comparison, here are the sizes of some other common platform IDs:
- Spotify Track ID:
https://open.spotify.com/track/2oriesSDLJ8vdbYbhYqEwq
2oriesSDLJ8vdbYbhYqEwq
using base-62 = 62^22 = 2^131 or 131 bits - YouTube Video ID:
https://www.youtube.com/watch?v=LmAG8-V_WQY
LmAG8-V_WQY
using base-64 = 64^11 = 2^66 or 66 bits - Dropbox Share ID:
https://www.dropbox.com/scl/fi/c6wah3uc1br3hkywoxwb1/insanely_large_amazon_workdocs_id.md?rlkey=chqgj57qjvprv5cbe0z8r0hab&dl=0
c6wah3uc1br3hkywoxwb1
assuming base-36 (numbers and lowercase alpha) = 36^21 = 2^108 or 108 bits
Yeah so an ID of 256 bits is like WTF?!
And it's encoded in base-16 which makes it use more characters than another base.
521878139568b6470762d27a1c4b2a5f6493738685c74fb02c0661ba2810ed64
in base-16 (0-9 and a-f) is
FIYeBOVaLZHB2LSehxLKl9kk3OGhcdPsCwGYbooEO1k
in base-64 (0-9, a-z, A-Z, and two symbols)
Thats 43 characters vs 64 or 32% fewer characters
come on, not using something more efficient than base-16 encoding is like 20 yrs old now
Lol, as I post this github gists also use base-16 for their ID encoding
https://gist.github.com/noahcoad/a98760f49d5cd678d10ee9ecca09e10d
a98760f49d5cd678d10ee9ecca09e10d
base-16 = 16^32 = 2^128 = 128 bits .. hey there's at least a standard GUID 🤣
#rantover
==[ article index ]==