Skip to content

Instantly share code, notes, and snippets.

@sheac
Last active May 3, 2018 16:26
Show Gist options
  • Save sheac/cf77ae23dcf637d8913541b14eb1fc88 to your computer and use it in GitHub Desktop.
Save sheac/cf77ae23dcf637d8913541b14eb1fc88 to your computer and use it in GitHub Desktop.

Context

We want to use Document checksums on clients to avoid re-fetching binary data we already have.

Checksums are sometimes expected to be NULL

Documents can be uploaded in bits. When they're not finished yet, they have a field pending that's set to true. During this time, the checksum is NULL.

Some documents don't have checksums when they should

Up until October 14 2016, we were letting Documents get into a state where pending = false (meaning they were fully uploaded) but the checksum was still NULL. Here's how we know:

mothership=> select count(*) from documents where checksum is null and pending = false;
 count
-------
 30839
(1 row)

mothership=> select max(updated_at) from documents where checksum is null and pending = false;
    max
------------
 1476435675
(1 row)

It's not clear why this was happening (starting around July 2015) or why it stopped. It could have been a bug fix, or it could have been the result of a different way of handling Documents. It's like that we won't encounter more like this, since about 91% of our Documents have been created since the last "bad" one.

Does anyone know what happened?

It would be helpful to have some information from someone who was there during this time. Perhaps they could point us in the right direction.

Suggested action

Using checksum for identifying Documents still seems like the right way to go. We should do three things to mitigate the problem of missing checksums:

  1. [BACKEND/FRONTEND] Do some historical research to determine what happened in mid-October 2016 to stop "bad" Document states.

  2. [BACKEND] Backfill all the missing checksums with a script. This represents new work not already accounted for.

  3. [FRONTEND] Treat the checksum->binaryFile cache on the mobile device as what it is: a cache. If the Document doesn't have a checksum, then that's a cache-miss, and we retrieve it from the backend. This action item can be considered part of work already planned, since it is a defensive-programming implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment