Skip to content

Instantly share code, notes, and snippets.

@cfm
Last active April 27, 2022 19:43
Show Gist options
  • Save cfm/4f9f84651312ec446b3a380b1772259a to your computer and use it in GitHub Desktop.
Save cfm/4f9f84651312ec446b3a380b1772259a to your computer and use it in GitHub Desktop.
[DRAFT] SecureDrop Client: grand unified state machine

[DRAFT] SecureDrop Client: grand unified state machine

Goal: To able to reason about the SecureDrop Client as a single application-level state machine, in which each global state is composed of (in order from left to right):

  1. an authentication state or state machine;
  2. a synchronization state or state machine; and
  3. a job-queue state or state machine.

Notes:

  • This diagram is both descriptive and prescriptive. It describes what I read the SecureDrop Client codebase and supporting specifications as wanting to do. It prescribes (or implies) some changes that might help realize those design intentions. (And, inevitably, it makes some assumptions that may no longer hold or serve us.)
  • Refer to the Mermaid state-diagram syntax if you have questions about how to read this diagram.
  • If you find GitHub's inline rendering of this diagram difficult to read, you can copy-paste the source of the .mmd file directly into the Mermaid live editor.

Prescriptions (or implications) of note

The following points are not true of the SecureDrop Client as implemented. Instead, they follow logically from the explicit state machine depicted here.

  1. No thread-level state! If the application is running, then the sync/queue threads are running.
  2. No thread-blocking processing loops: keep the QObject event loop available! The queue worker processes the next job at the later of (a) the current job finished or (b) a new job is enqueued.
  3. As a consequence of (1) and (2), a PauseQueueJob actually sleeps the queue for PAUSE_INTERVAL (or RETRY_INTERVAL).
  4. UserHasLoginAttemptInProgress and UserHasLogoutRequestInProgress are both blocking and modal, because their transitions within the authentication state machine supervene on all other state machines, even application-level quit.
    • In particular, the transition UserHasLogoutRequestInProgressUserIsLoggedOut clears the session state (authentication token) whether or not the logout request succeeds. Or, if we want to be really strict about cleaning up server-side state before it expires anyway: If a session's logout request fails, the Client proceeds to UserIsLoggedOut, but the token is saved in memory. If and when a subsequent login attempt succeeds, a new logout request is made for the saved token.
      • Why is this conclusion necessary? We can't interrupt a job in process. Therefore, if we allow UserHasLogoutRequestInProgressUserIsLoggedOut while a job is still in progress, we can't guarantee that a job will be processed only by the session to which it belongs. (See also: freedomofpress/securedrop-client#420.)
      • This may have opsec implications we should threat-model, because (e.g.) a journalist will not be able to log out of the application immediately in an emergency.

There may be others! These are just the ones I've parsed out in my own thinking so far.

Assumptions

Known issues

Things not addressed in this draft from March 28 team feedback:

  • On successful sync, the queues resume. This makes sync an implicit heartbeat.

  • Need to check against all network QA cases.

    • When are retries silent because routine (because Tor)? When do they bubble up to an application-level error state? (BaseApiJob retries thrice silently.)
    • "Read jobs" (sync, download) versus "write jobs" (upload, star, delete): differences in retry/failure logic?
  • Need to distinguish (a) reauthenticated failed session from (b) newly authenticated subsequent session.

  • Need to explicitly document offline mode. @eloquence:

    I think it may be worth explicitly distinguishing the OfflineMode state (user-initiated or as a result of session expiry) and the SignedInWithNetworkError state (impacts interactivity of the app in certain ways, impacts queue jobs being tried)

  • Need to clarify PauseQueueJob as blocking operation versus queue-level operation.

    • Why do I (@cfm) dislike manipulating queue via jobs?

Background reading

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sssoleileraaa
Copy link

sssoleileraaa commented Mar 28, 2022

Also see (historical) links on network error handling and retries (back when jen was tech lead and nina was doing the ux work for this):

@cfm
Copy link
Author

cfm commented Apr 27, 2022

Thanks, all, for your feedback on this presumptuously prescriptive "grand unified" monolith. Moving on per freedomofpress/securedrop-client#1452 (comment), I've recorded everyone's feedback from our March 28 discussion in https://gist.github.com/cfm/4f9f84651312ec446b3a380b1772259a#known-issues for posterity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment