Skip to content

Instantly share code, notes, and snippets.

@captainsafia
Created September 22, 2018 13:24
Show Gist options
  • Save captainsafia/fe66d76956a446aa90429dbbd14508af to your computer and use it in GitHub Desktop.
Save captainsafia/fe66d76956a446aa90429dbbd14508af to your computer and use it in GitHub Desktop.

Realtime collaboration

While we don't have to ship with this out of the gate, we need to think about realtime collaboration for the future. Might as well write it here in a hackpad too while we're at it. ;)

Single User Multi User

Even for the case of the local desktop application, we need to be able to work with ourselves (multiple cursors and contexts), if we have the same notebook opened twice. This is even more important for a web based version of nteract.

Getting it working for a single user with multiple instances is 90% of the solution. [The other 10% includes: proper handling of undo with multiple editors at once; showing other cursors (and making it easy to jump to them), which isn't really needed for single user; out of band chat (something on the side, etc. that isn't in the document); if you have any form of history, then showing who did what.]

Smaller use cases

In order to get our own heads wrapped around realtime collaboration, we started up a project to do realtime collab with a single-cell setup called play.

We'd start with learning how to do this with codemirror. Stein pointed us to some SageMathCloud code we can take advantage of that relies on Google Diff Match Patch:

https://github.com/sagemathinc/smc/blob/f35739a05a22c04970c98b9f009896d6a52793cf/src/smc-webapp/misc_page.coffee#L616-L665

Link that will be out of sync one day ;) -- just search for "Codemirror Extensions" in the file:

https://github.com/sagemathinc/smc/blob/master/src/smc-webapp/misc_page.coffee#L442

He also said we may use this code for patch application under any license and in any derived way. [And I'll say it again right here! Here's the current Jupyter-related code too, which I rewrote in April 2016: https://github.com/sagemathinc/smc/blob/master/src/smc-webapp/editor_jupyter.coffee; as above, we're willing to relicense anything in particular -- even if just "feel good" since you'll rewrite all the code in typescript. It would be especially good if there were an npm module extracted out of something I have, which we can both depend on.]

Survey

Some functions William Stein said that

"[We] may use this code for patch application under any license and in any derived way"

https://github.com/sagemathinc/smc/tree/sync2/src/smc-util

We have to think about this in terms of the frontend and the backend.

In the frontend we're already steering toward a model where we have a central message bus

The colaboratory team mentioned to me that in the frontend we need to:

  • have a central message bus (basically what we're doing with Flux)
  • be able to accept state changes from external sources (e.g. the API that gives us the realtime component)

Algorithm: Distributed Interactive Action Log (DIAL)

Author: William Stein

The goal is to come up with the simplest possible usable algorithm built on top of an abstraction -- "a synchronized table" that I built on RethinkDB); I introduced timestamps because they simplified the algorithm a lot. I like simple things I can fit in my head.

TL;DR/Abstract - The state of the document is the combination of actions from all users, in order.

  • Synchronize clocks by assuming relative timing for each user
    • Similar to the SMTP algorithm
    • Needs to be seconds of accuracy
    • "Don't care" about cheating (as in gaming), you're intended to edit the document
    • (Possibly) better approach to clock sync: http://www.mine-control.com/zack/timesync/timesync.html, based on real-world experience with distributed games; basic idea is to think a little more about the distribution and throw away outliers to get better sync
  • Record a timestamp and session ID for every action
  • Record action, insert locally, expect rectification externally
  • Insert new actions

Roughly speaking, what we're trying to keep in sync on the frontend is an Immutable Map, with keys [timestamp, session ID]. (Session ID needs to determine account for presenting a history to users with info about who did what.)

Log ---> Document

Also known as:

actions.scan((action, document) => {
  return action.reducer.call(action, document)
})

Chat as a necessity [[Ge: <= this is called "out of band communication. ":D**]]** As pointed out in the Differential Sync papers, if you write a realtime collaborative document interface and don't implement chat you end up with emergent chat inline in a document. Here in hackpad that tends to become the annotations text. For a notebook that might become code comments or markdown with a particular style.

Notebook actions

  • Execution counter
  • Outputs
  • Source within a cell
  • Cell movement
  • Cell deletion

Handling those base64 encoded images in the notebook

At least for the local notebook app, we could make our in memory format use a blob URL, which would prove out the use of URLs first. It would not work across multi-user (would need an object store), it would only be a way of proving out an alternative approach. Filesystem format would stay the same.

SageMathCloud uses

"image/png": "smc-blob::9e850da7-325c-4f9f-b5ce-fd976578421e"

Consider for public notebooks (GitHub, nbviewer, etc.)

"image/png": "https://im.ephem.it/9e850da7-325c-4f9f-b5ce-fd976578421e"

To make indexing more friendly, lighter weight documents, etc. Could be local (file://) ?

Questions and Random Stuff What are possible configurations of users and kernels.

  • Keep front-ends synced -> every user has their own kernel
    • unwind as a way to deal with netsplits?
    • two kernels seem to be a bad idea watching Matthias' talk
  • One kernel for everyone
    • how to resolve "priority"?

What if the kernel state that gets synced is the connection information to a shared kernel?

Distributed version of commutable? Yeah I guess that's what we're aiming at here! "A commutable" = series of notebooks?? If two clients agree on "a commutable" they must agree on the state of the world I think. Possibly relevant issue: nteract/nteract#53

For what it is worth, in SMC (in Jupyter and Notebooks) there is just one kernel that everybody shares, and nobody has ever asked for anything or suggested this wasn't the canonical right thing to do.... but it does clearly lead to confusion that the UI could better address. E.g., when two people try to run things at once and don't know the other person launched something -- we need a sort of "zoomed out" view (like in sublime) with info about what is being executed. Or at least some indicator that code has been launched. The only possible reason for other models is that the kernel is running in the browser (say) like Google did and it's a hack to deal with that.

In SMC there are no actions (or technically, there's just one), since I do everything in the abstract to a single string with a patch-friendly format, ... then render whatever that string represents in a concrete way (exactly like the virtual DOM approach of react). This right now: https://github.com/sagemathinc/smc/blob/sync2/src/smc-webapp/editor_jupyter.coffee#L867 (the function "set_nb"). I'm not saying that you should do this, or that I will continue to with SMC longterm, only that it is easy.

By far the two biggest problems in SMC are:

  1. Users creating notebooks so that the .ipynb file is 25MB or more. This happens quite naturally...

  2. Users changing the .ipynb file on disk (e.g., via "git checkout"), and expecting the visible notebook to instantly sync to that, but it not doing so.

How does SMC handling synchronization amongst the widgets and other comm messages? Are those not part of the document?

State of multi-user and realtime in Jupyter

https://www.youtube.com/watch?v=DyGoHAP8B_s

https://www.youtube.com/watch?v=DyGoHAP8B_s

  • We should collate Matthias' thoughts on how multi-users interact with kernels (remote vs. local, all sharing the same kernel, only one user can interact with a kernel, etc.)

    • security issues/evil JS/malicious friend:
      • inside nteract there is not much others can steal in terms of secrets via JS
      • shared kernel running on a third party -> can only steal secrets (via !cat /etc/paswd etc) from that third party not my co-editors of the notebook
  • He does mention SMC at 14:00; he also lists 4 claimed drawbacks to the "project" design:

    • Need ability to create/manage unix users
      • Not a problem for SageMathCloud
      • a perceived issue for JupyterHub (we rely on people creating their own users)
    • !whoami != $ whoami
    • Doesn't completely fix frontend security issues
    • Loss of traceability about who did what. [I think he is completely wrong about this too -- if you record every change people make as the .ipynb file evolves, and who makes the changes, then you know exactly who did what.]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment