To: Chengzheng Sun, David Sun, Agustina, Weiwei Cai
Thanks for the paper, it was very interesting to read.
Unfortunately the paper is having a strong direction of:
CRDT vs. OT
but that is not helpful in practice.
We need to use the best of OT and combine it with the best of CRDT.
The paper overall feels a little bit biased towards OT (and being that the authors have a commercial product called Codox Wave explains that ;)).
This response is structured in 3 sections:
- OT and rich-text
- Tombstones
- Bulk Operations
- Practical usage
- Real-world problems
- Conclusion
The first assumption the paper makes is that OT is better suited for plaintext editing than CRDT.
And I would 100% agree with that. As long as you deal with a plain text string with no semantic meaning, this is true and OT is indeed simple to implement and has not too many edge cases. CRDT is just overkill for that use-case.
However the paper then conflates that to the case of rich-text and this is not a valid case, because rich-text does have semantic meaning.
And with OT - as you already noted - it is very hard to implement correctly, because once you have a syntax you need to keep valid you need to add post-fixers (CKEditor 5) to ensure that operations keep their semantic meaning OR have strong server logic to fix operations centrally.
The solution (that most text editors came up with) obviously is to not operate on pure HTML, but instead on JSON models with rich objects.
However in the end a semantic state change to add a new row to a table that can be expressed as:
$table->changeRows(3)
within an OT model still has to be expressed as a textual change at the exact offset where the table attributes are defined:
delete_character(5)
insert_character(5, 3);
While with CRDT this is simply an operation like:
change_attribute($table_id, 'rows', 3);
Of course that feels 'cleaner' and the CRDT keeps the semantic meaning.
A large portion of the paper deals with two CRDT implementations that had problems with tombstones. The reality is that CKEditor 5 for their OT implementation had to add a graveyard to simplify their OT edge cases, so OT implementations in practice sometimes use tombstones as well.
Also the given example of 1.6 billion lines deleted and 533 lines being kept is a pure theoretical one as no-one would store all 1.6 billion lines in the database.
In practice most collaborative editor sessions are short and clients eventually disconnect. It is usually also okay to load a snapshot (without tombstones) when a new client connects.
We have to deal with two cases here for which we need the tombstones:
- Online editing (editor freezes when it has no connection)
- Offline editing
The more interesting case is offline editing. Let's assume a client did store an offline copy of the Wikipedia article and disconnects for a year. It makes some simple changes that have not been synchronized upwards.
During this year all kinds of changes appear and in the end there is 1.6 billion lines in the history.
For OT to sync that correct all operations of changes need to be stored on the server, on Connection all operations need to be replayed, which probably takes quite a while and quite some storage on the server as well.
For CRDT the snapshot of connecting clients would contain all the tombstones, so that older operations can still be replayed.
In practice it is totally okay for both the server to not keep all of the operations regarding the 1.6 billion lines of code and the client to not keep all of the data in tombstones.
For a product it is okay to say that changes need to be synchronized every 4 weeks for offline editing to work. An in fact even Google Docs has lost me several changes on mobile already when I did not re-connect in a timely manner to sync those [and asked me to re-sync first].
An interval based approach [based on dates or clocks] can be used to discard invalid state. CRDT operations that come afterwards will be invalid on all clients and the offline client needs to sync a new document (maybe provide a diff to it's changes) before being able to collaborate again.
Also the problem of tombstones is well known in academic literature from another field: Lockless data structures
And epoch-based-reclamation, hazard pointers, etc. are all good ways to resolve that problem.
On the other hand the 1.6 billion lines thingy is solvable in a different way:
While so far with CRDT this all was possible without a central server, the OT solution that stored all the transformations for the 1 year was central.
In case that OT would need offline p2p support, all clients would also need to store all operations that ever happened in the system for all time, because any one of those would need to bring the 1-year-offline-client up to speed.
So we have been comparing apples to oranges in terms of space-time-complexity.
Once we introduce a central server, the problem of HUGE tombstones is easily solvable, by allowing the server to transform the operations to work on valid ids, which is a little bit like OT as it transforms one operation to another (heh, OT within CRDT - nice!).
The clients in this case would GC the tombstones after a while, the server however would archive them to some persistent storage indexed by their unique-ids and with tombstone pointers within the document to the persistent storage.
Once the 1-year-offline-client provides it's operations, the server transforms them from the no longer existing tombstones to the IDs they would have now (and yes again this is similar to OT now except that it still works on unique-ids) and sends those operations to the clients and a new snapshot to the 1-year-offline-client.
I am not saying this is efficient or even desirable, probably a simple diff + conflict gives the user more semantic meaning, but it is possible and the space-time-complexity of CRDT can be completely eliminated once a central server is introduced - which OT takes for granted in the comparison.
It is also simple for a server to reject those changes as it has a complete overview of which ids are valid and which are not.
To implement the GC (including the central server) in a safe way, the concept of the grace period can be used (the tombstones are invalid but will not be GC'd until the next period). This ensures that drifting clocks will not have an impact.
The other efficiency argument is bulk operations. Especially yJS (backed by a researcher of CRDT), has several optimizations to operate on ranges instead of single characters. [https://github.com/y-js/yjs/blob/master/README.v13.md#yjs-crdt-algorithm]
That does not solve the interleave problem, but mitigates it for more common cases that appear in practice:
Paste a HUGE text
Google Docs: 6 seconds paste, 6 seconds undo Quip: 3 seconds, 4 seconds undo, then 6 seconds, 6 seconds undo, then 10 seconds yJS + Prosemirror: Immediate
And it makes sense: Instead of assigning an increasing id to a blob of text one char at a time, you just assign the range:
insert(unique_id, 'This is a long text. Lorem ipsum ...', [new_unique_id_start,new_unique_id_end])
Every character still has it's own unique id in this system, but only once there is an edit within the range does it need to be split.
And again this is related in problem space to e.g. memory management and memory allocation using COW without a MMU, where memory also needs to be split.
I assume the reason why the commerical offerings use OT instead of CRDT is that they just purchase a pre-build library for OT that they can just plug-in. In our tests yJS + Prosemirror / yJS + Quill worked wonderfully and both the theory as well as the practical implementation looks sane.
It is true that a full mapping of the editor state to e.g. a YXMLFragment needs to be done and that this takes time. On the other hand browsers do state based comparisons and re-renders all the time with e.g. react and for all practical purposes it is fast enough to keep the YXMLFragment as the source of truth.
In fact comparing again with Google Docs (the gold standard), there is a very noticeable input delay when opening two browser windows, while yJS+prosemirror is pretty much instant when typing.
So again performance cannot be the reason why OT is chosen over CRDT.
For me as an implementer in practice an additional argument for CRDT (with central server) is that I can allow certain operations while prohibiting others [Note: that is not true as seen below].
For example if I wanted to allow someone to just comment things on my document, but not do any edits or other state changes, I need to have a very strong model and a completely running editor on the server so that I can see if this OT transformation on the document is valid.
With CRDT on the other hand I have semantic meaning (or I can build a CRDT type that has semantic meaning):
operation('mark', 'comment', [unique_id_start,unique_id_end], {'comment_id': 1234, 'user_id': 42})
though while writing this I can see that especially for this, the same can be expressed with a kind of json-OT:
operation('mark', 'comment', [start_position,end_position], {'comment_id': 1234, 'user_id': 42})
So maybe the main difference between OT and CRDT is if you can express everything as ranges or using IDs.
In the one model you get the range implicitly from the position the unique_id is now in, in the other you get it explicitly by transforming the range based on operations that have happened already.
Without a central server both have around the same space-time-complexity, with a central server GC can work well on the clients for CRDT, while it is implicit in OT.
In the end as long as you have rich operations CRDT vs. OT does not matter that much, because if you know there is a table object at position 5 which has an attribute 'rows' that you want to change, this is the same as if you change the table object with id X.
In the end this is a "lookup table" problem for rich operations on editor objects or ranges.
I believe I have addressed most points of the paper by now.
My conclusion is that both OT and CRDT work well, but the most complexity comes if you try to use plaintext-OT or plaintext-CRDT with rich text editors directly.
Instead types like JSON-OT or XML-CRDT allow semantic operations, which simplifies everything.
But then OT and CRDT need the same amount of operations and are indeed very similar.
Assume an editor with a JSON model.
To be able to apply rich OT operations this JSON model needs to be serialized to e.g.: {text}{pointer-to-oject}{text} or toJson($json_state)
So the JSON is converted into text, OT is applied, then it is converted back to JSON and the difference applied to the model or the model exchanged with the new JSON.
For CRDT the same JSON is converted into e.g. a YXMLFragment, the CRDT operations applied and then converted back either by applying the changes or exchanging the JSON model again.
The overall practical difference between OT and CRDT is neglectible if abstracted like this and the effort in modern editors is roughly the same.
In the end the best would be if the model schema change step could be avoided by directly applying the changes to the objects within the model. For that you need to identify where they are now so that you can change the state of the right object.
Either you have pointers from unique_ids to the objects or pointers from within the text representation to the object. In any case either via offset or unique ID you end up finding the object and can change it's state directly.
Also we found out that semantic operations (ala react redux) are better than pure text operations.
In the end also the state changes in the editor need to be idempotent and commutative, but that is a problem that both OT and CRDT have.
Consider a 2x2 table that MUST only hold 8 cells.
Client A: $table->setRows(3)
Client B: $table->setColumns(3)
Within both OT and CRDT in the end we end up with an invalid model of a 3x3 table.
OT typically solves that by applying all changes from the server before sending out it's operation as a transaction (e.g. Firepad).
But CRDT can do the same and just apply all CRDT operations before sending out it's own operations.
Now the last operation was invalid, so it needs to be undo'd and the user informed of the error in the model (in both CRDT and OT).
In any case using the central server with the guaranteed order [as simple as monothically increasing transaction IDs] the problem is easily put on the user that did the invalid operation.
Obviously if the user has now also added some text, then this would have been undo'ed as well and likely the best way to solve that is to allow either a staging area to preserve the old document state OR duplicate the whole object before the conflict (how automerge does it).
In any case it is something a rich text editor needs to handle and the more interesting problem than if OT or CRDT is used for ensuring that the concurrent operations converge.
I still feel that unique_ids that map to an object repository [hash table] are simpler to implement than to find the object index based, but that is an implementation detail and just personal preference.
Hi Fabian,
We appreciate your sharing views about our paper. Here are some comments from us.
The vast majority of reported CRDT solutions for co-editing are confined to insert and delete operations on a sequence of atomic objects (e.g. plain text characters). In addition, any solution that supports coediting with insert/delete operations in linear data domains is fundamental for supporting co-editing in richer data models, such as rich text, therefore we focus this paper on comparing OT and CRDT in the context of plain-text co-editing.
The paper's conclusions are indeed critical of CRDT coediting solutions, in terms of correctness, complexity, and other purported advantages over OT. The facts and evidences we provided were based on our comprehensive study and first-hand experiences from building coediting systems and products. We thus stand by the conclusions drawn in the paper.
It looks that you have no issue with this part (99%) of the paper (as you stated, being 100% agreed with that “OT is better suited for plaintext editing than CRDT”), but were mainly concerned with the real differences between OT and CRDT for rich-text co-editing, which was beyond the scope and only briefly touched in our paper. We also share the view that issues involved in supporting real world (e.g. rich-text) co-editors are far more important, challenging, and interesting than convergence issues for plain-text co-editors. Further convergence is a solved issue as far as OT-based solutions are concerned. Yet, it is unfortunately a matter that is surrounded by misconceptions often found in CRDT literature. We hope you agree that it is worth the effort to clear this up for developers and researchers in the co-editing space.
The bulk of your comments relate to how one might address specific rich-text co-editing issues with CRDT or OT. We haven't been able to make sense of what you sketched out with Yjs (which is a WOOT-like tombstone-based CRDT). We can however tell you that we solved those issue using OT in very different ways. Furthermore, OT itself doesn't imply the kind of solution you described. For example, a transformation-based server was never required or involved in any of the OT-based rich-text co-editing systems that we have built, including CoWord, CoPowerPoint, CoMaya, and Codox Apps.
For your information, we have re-organized the materials in the original paper into a series of three articles, for readers of different background and interests:
Real differences between OT and CRDT under a General Transformation Framework for Consistency Maintenance in Co-Editors. https://arxiv.org/abs/1905.01518
Real Differences between OT and CRDT in Correctness and Complexity for Consistency Maintenance in Co-Editors. https://arxiv.org/abs/1905.01302
Real Differences between OT and CRDT in Building Co-Editing Systems and Real World Applications. https://arxiv.org/abs/1905.01517. Materials in this article should interest developers looking for techniques to build real world (e.g. rich-text) applications.