Skip to content

Instantly share code, notes, and snippets.

@noahcoad
Last active October 3, 2024 17:45
Show Gist options
  • Save noahcoad/a98760f49d5cd678d10ee9ecca09e10d to your computer and use it in GitHub Desktop.
Save noahcoad/a98760f49d5cd678d10ee9ecca09e10d to your computer and use it in GitHub Desktop.

Amazon Workdocs Doc IDs are Insanely Large

Here is an example Amazon Workdocs doc link:

https://amazon.awsapps.com/workdocs-preview/index.html#/document/521878139568b6470762d27a1c4b2a5f6493738685c74ab02c0661ba2810ed64

This is the ID: 521878139568b6470762d27a1c4b2a5f6493738685c74ab02c0661ba2810ed64
Boy do they use a long unique identifier.
You’d think they were trying to index a galaxy of nearly infinite docs.
the url is using hex digits (base-16) and not base-64, which is less efficient character count wise … still
that doc id is 64 chars of base-16, so that’s 16^64 or 1.16e77
that’s 1 followed by 77 zeros
yikes
a standard GUID is 128 bits or 2^128
that ID is 2^256
2^129 would be twice the size of a standard GUID
so 2^256 is insanely large
dang I’d like to see their white paper justification for that

They could be encoding numerous pieces of data in there, but more likely it's simple a DB lookup of the ID to all the metadata. Or due to multiple corporate accounts and AWS regions some of that could be in the ID to route to the right DB instance. Still that's a really big ID.

It could be two GUIDs just stacked next to each other (128 bits each), like
521878139568b6470762d27a1c4b2a5f and 6493738685c74ab02c0661ba2810ed64

My first guess is it's a GUID for an account and a GUID for the file in the account. Though each account has a unique subdomain, so IDK about that.

For comparison, here are the sizes of some other common platform IDs:

  1. Spotify Track ID:
    https://open.spotify.com/track/2oriesSDLJ8vdbYbhYqEwq
    2oriesSDLJ8vdbYbhYqEwq using base-62 = 62^22 = 2^131 or 131 bits
  2. YouTube Video ID:
    https://www.youtube.com/watch?v=LmAG8-V_WQY
    LmAG8-V_WQY using base-64 = 64^11 = 2^66 or 66 bits
  3. Dropbox Share ID:
    https://www.dropbox.com/scl/fi/c6wah3uc1br3hkywoxwb1/insanely_large_amazon_workdocs_id.md?rlkey=chqgj57qjvprv5cbe0z8r0hab&dl=0
    c6wah3uc1br3hkywoxwb1 assuming base-36 (numbers and lowercase alpha) = 36^21 = 2^108 or 108 bits

Yeah so an ID of 256 bits is like WTF?!

And it's encoded in base-16 which makes it use more characters than another base.
521878139568b6470762d27a1c4b2a5f6493738685c74fb02c0661ba2810ed64 in base-16 (0-9 and a-f) is
FIYeBOVaLZHB2LSehxLKl9kk3OGhcdPsCwGYbooEO1k in base-64 (0-9, a-z, A-Z, and two symbols)
Thats 43 characters vs 64 or 32% fewer characters
come on, not using something more efficient than base-16 encoding is like 20 yrs old now

Lol, as I post this github gists also use base-16 for their ID encoding
https://gist.github.com/noahcoad/a98760f49d5cd678d10ee9ecca09e10d
a98760f49d5cd678d10ee9ecca09e10d base-16 = 16^32 = 2^128 = 128 bits .. hey there's at least a standard GUID 🤣

#rantover

==[ article index ]==

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment