Skip to content

Instantly share code, notes, and snippets.

@cfm
Last active April 27, 2022 19:43
Show Gist options
  • Save cfm/4f9f84651312ec446b3a380b1772259a to your computer and use it in GitHub Desktop.
Save cfm/4f9f84651312ec446b3a380b1772259a to your computer and use it in GitHub Desktop.
[DRAFT] SecureDrop Client: grand unified state machine

[DRAFT] SecureDrop Client: grand unified state machine

Goal: To able to reason about the SecureDrop Client as a single application-level state machine, in which each global state is composed of (in order from left to right):

  1. an authentication state or state machine;
  2. a synchronization state or state machine; and
  3. a job-queue state or state machine.

Notes:

  • This diagram is both descriptive and prescriptive. It describes what I read the SecureDrop Client codebase and supporting specifications as wanting to do. It prescribes (or implies) some changes that might help realize those design intentions. (And, inevitably, it makes some assumptions that may no longer hold or serve us.)
  • Refer to the Mermaid state-diagram syntax if you have questions about how to read this diagram.
  • If you find GitHub's inline rendering of this diagram difficult to read, you can copy-paste the source of the .mmd file directly into the Mermaid live editor.

Prescriptions (or implications) of note

The following points are not true of the SecureDrop Client as implemented. Instead, they follow logically from the explicit state machine depicted here.

  1. No thread-level state! If the application is running, then the sync/queue threads are running.
  2. No thread-blocking processing loops: keep the QObject event loop available! The queue worker processes the next job at the later of (a) the current job finished or (b) a new job is enqueued.
  3. As a consequence of (1) and (2), a PauseQueueJob actually sleeps the queue for PAUSE_INTERVAL (or RETRY_INTERVAL).
  4. UserHasLoginAttemptInProgress and UserHasLogoutRequestInProgress are both blocking and modal, because their transitions within the authentication state machine supervene on all other state machines, even application-level quit.
    • In particular, the transition UserHasLogoutRequestInProgressUserIsLoggedOut clears the session state (authentication token) whether or not the logout request succeeds. Or, if we want to be really strict about cleaning up server-side state before it expires anyway: If a session's logout request fails, the Client proceeds to UserIsLoggedOut, but the token is saved in memory. If and when a subsequent login attempt succeeds, a new logout request is made for the saved token.
      • Why is this conclusion necessary? We can't interrupt a job in process. Therefore, if we allow UserHasLogoutRequestInProgressUserIsLoggedOut while a job is still in progress, we can't guarantee that a job will be processed only by the session to which it belongs. (See also: freedomofpress/securedrop-client#420.)
      • This may have opsec implications we should threat-model, because (e.g.) a journalist will not be able to log out of the application immediately in an emergency.

There may be others! These are just the ones I've parsed out in my own thinking so far.

Assumptions

Known issues

Things not addressed in this draft from March 28 team feedback:

  • On successful sync, the queues resume. This makes sync an implicit heartbeat.

  • Need to check against all network QA cases.

    • When are retries silent because routine (because Tor)? When do they bubble up to an application-level error state? (BaseApiJob retries thrice silently.)
    • "Read jobs" (sync, download) versus "write jobs" (upload, star, delete): differences in retry/failure logic?
  • Need to distinguish (a) reauthenticated failed session from (b) newly authenticated subsequent session.

  • Need to explicitly document offline mode. @eloquence:

    I think it may be worth explicitly distinguishing the OfflineMode state (user-initiated or as a result of session expiry) and the SignedInWithNetworkError state (impacts interactivity of the app in certain ways, impacts queue jobs being tried)

  • Need to clarify PauseQueueJob as blocking operation versus queue-level operation.

    • Why do I (@cfm) dislike manipulating queue via jobs?

Background reading

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@eloquence
Copy link

Should the impact of job priority on queue processing be reflected in the diagram, or do you regard that as out of scope for this purpose?

@cfm
Copy link
Author

cfm commented Mar 23, 2022

@rocodes in https://gist.github.com/cfm/4f9f84651312ec446b3a380b1772259a?permalink_comment_id=4107828#gistcomment-4107828:

One thing I'm wondering about is the logout behaviour - if the logout request fails, the current state diagram shows the client returning to the UserisLoggedOut state, with an empty queue. However, if a logout request fails, my understanding is that it would be re-added to the queue, even if we already display the 'logged out' gui.

Yep! This is a particular place (motivated by freedomofpress/securedrop-client#1420) where this diagram turns prescriptive. I'll flag this and a few other examples more explicitly as provocations for discussion. :-)

@cfm
Copy link
Author

cfm commented Mar 23, 2022

@eloquence in https://gist.github.com/cfm/4f9f84651312ec446b3a380b1772259a?permalink_comment_id=4107929#gistcomment-4107929:

Should the impact of job priority on queue processing be reflected in the diagram, or do you regard that as out of scope for this purpose?

Out of scope insofar as it's (I'm arguing) an implementation detail of the ApiJobQueue and RunnableQueue classes. The premise of that argument (I'm realizing as I make it :-) is that the QObject event loop and our job queues can (and I'm arguing should) be understood as the application's control and data planes. If the former supervenes on the latter, then the prioritization, queuing, and processing of data-plane operations (jobs) is safely decoupled from the control plane (application-level state machine).

As in this proposal generally, I don't mean to overthink or overcomplicate things! I do mean to propose and test strong hypotheses about how things (could) work, with the goal of simplifying along strict separations of concerns. :-)

@sssoleileraaa
Copy link

I don't see the path where a successful sync resumes the queues if they're paused. Should that be reflected in this diagram?

@sssoleileraaa
Copy link

sssoleileraaa commented Mar 24, 2022

No thread-level state! If the application is running, then the sync/queue threads are running.

Right now, you can start the application in offline mode, so it looks like that's reflected in the state machine by showing the UserIsLoggedOut state where UserHasNoSession? Just double-checking if I'm reading that right.

Edit: Ah, yup! I see, I see... the UserHasNoSession is the auth state, and I see that after authentication succeeds it becomes UserHasSession. I just needed to take a closer look and realize that you were very organized in keeping the auth state to the left, sync state in the middle, and queue state on the right.

Might make sense to add something that reflects offline mode separately, e.g.:

UserIsLoggedOut --> UserIsLoggedInAndOffline: on user login offline-mode
UserIsLoggedInAndOffline --> UserHasLoginAttemptInProgress: on user attempted login

Where UserIsLoggedInAndOffline is a new state where the queue is running but the user is not authenticated.

@sssoleileraaa
Copy link

sssoleileraaa commented Mar 28, 2022

Also see (historical) links on network error handling and retries (back when jen was tech lead and nina was doing the ux work for this):

@cfm
Copy link
Author

cfm commented Apr 27, 2022

Thanks, all, for your feedback on this presumptuously prescriptive "grand unified" monolith. Moving on per freedomofpress/securedrop-client#1452 (comment), I've recorded everyone's feedback from our March 28 discussion in https://gist.github.com/cfm/4f9f84651312ec446b3a380b1772259a#known-issues for posterity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment