ProseMirror + CRDT's?

I recently checked out the automerge library, which uses an implementation of conflict-free replicated data types (CRDT’s) to allow collaborative modifications to a JSON-like state, in a peer-to-peer friendly way. It comes with a WebRTC reference implementation for collaboratively editing documents p2p.

I’ve read Marijn’s blog post contextualizing why ProseMirror uses Operational Transformation fo facilitate collab, but am generally curious: could ProseMirror be made to work with CRDT’s? Automerge itself not be a helpful reference point (it’s not clear how something like ProseMirror’s schema rules would be enforced), but a peer-to-peer compatible layer for ProseMirror seems like a cool and compelling use case.

I’d love to get a sense of what challenges might get in the way of making ProseMirror CRDT compatible; I’m imagining it’s a difficult problem (consensus! convergence!), but could open up some interesting uses for ProseMirror…

9 Likes

No one has worked on this yet, so there’s not much I can tell you. But research in that direction would be interesting.

OK I spent some time playing around with a really naive concept of what this might look like, and threw the results into a repo here:

Automerge is able to model any JSON-like object, tracking any changes made to that object as a series of operations. The code in prosemirror-automerge:

  • converts the Node.toJSON() representation of a ProseMirror doc into an Automerge doc
  • provides a ProseMirror plugin which takes incoming Steps from a transaction, and attempts to “apply” them to the Automerge doc
  • takes incoming changes from a remote Automerge doc, and attempts to translate them to ProseMirror steps suitable for application to the current editor’s doc

There’s a demo in the repo where two EditorView's are wired together with a basic Automerge syncing implementation, and it partially works for basic text-insertion-y types of transforms. It doesn’t work at all for deletions/marks/basically “everything else”.

Some extremely loose thoughts based on this experience:

  • it would be handy to have a way to project the indexing scheme used by ProseMirror onto a plain object outputted by Node.toJSON(). I used the ResolvedPos.path private variable to implement this, but I’m 100% certain this would break and isn’t the correct way to do this:

… in my mind this would look something like the signature of the Node.descendants method, except instead of yielding ProseMirror Nodes, it would just be plain objects (plus the current position). probably this isn’t too hard to implement, my brain is just sputtering on a clean way to do so :slight_smile:

  • in a similar vein, I wanted some way of taking an incoming Step and trying to figure out a minimal representation of the delta change in data as it pertained to the transaction’s tr.doc.toJSON(). Almost like something you could describe as a JSON patch.
  • in general, representing ProseMirror docs as Automerge docs works well, provided you can efficiently translate between the two systems; one finicky aspect of this is that for Automerge to detect incremental changes, I needed to represent ProseMirror text nodes’ .text property as an array of characters (rather than a string) so that changes to individual characters would be seen as having originated from distinct “actors” (in Automerge parlance)

I’m not totally sure how productive it is to continue with this experiment, as I have a hunch (unproven!) that there are fundamental issues which would make Automerge’s data modeling incompatible as-is with ProseMirror.

I am, however, curious to understand the fundamental merging strategies that are described in this academic paper. Automerge decorates its data structures with an array of ops, which look suspiciously like a ProseMirror Step (insertion/deletion as position x/y/z, with contents abc). My somewhat hazy understanding of CRDT’s in general is that they’re not dissimilar from ProseMirror’s indexing system, with the addition that each position has a unique id which corresponds to the person responsible for having created the position. So a document like:

0   1 2 3 4    5
 <p> O n e </p>

Might be represented as:

0{user:1}   1{user:2} 2{user:1} 3{user:1} 4{user:1}    5{user:1}
        <p> O         n         e                 </p>

I’m sure there’s an enormous amount of complexity that I’m eliding here, but I would be curious to understanding if something like this would be possible to implement directly with ProseMirror…

5 Likes

Did you go any further with your CRDT work on ProseMirror @saranrapjs?

Sort of. I’ve just pushed some further noodling to the master branch.

Specifically, I’ve added this test file which attempts to document some of the types of transforms that the coordinate mapping needs to support (and which aren’t yet currently supported). I remain optimistic that this is possible, but there’s definitely quite a bit of work to go!

Oh sweet!

Given how rich and solid the Prosemirror API around transactions, it feels like if there was a solid working CRDT solution then Prosemirror would be just the no brainer choice for collaborative editing.

Checking out your new code @saranrapjs - thanks for breaking this ground.

I recently implemented ProseMirror plugin for Yjs (an operation-based CRDT). Demo / src

I map ProseMirror block nodes to YXmlElement (a shared data type that has a node name, attributes, and a list of children) and ProseMiror text nodes to YXmlText (a shared text type that supports formatting attributes - a.k.a. marks).

When the ProseMirror state changes, I compute a diff and apply the changes to the Yjs document. When the Yjs document changes, I simply create a new state based on the Yjs document and replace the old one. I do reuse old ProseMirror nodes, and it seems that this is fairly efficient given the immutable nature of ProseMirror state.

Mapping a CRDT to a ProseMirror document is not always possible, because ProseMirror has a schema that the document needs to comply to. E.g. given a blockquote that must have at least one child. If client1 deletes the first child, and client2 concurrently deletes the second child, you may end up with a blockquote without children (well, that depends on the CRDT). In this case I simply delete the node that does not comply with the schema anymore. This is why I went for the above approach of recreating the state based on the Yjs model, because manipulating the state with transformations would involve a lot more overhead. The disadvantage of my approach is that it destroys all decorations. The next step would be to find a method that only applies the diff between the old and the new state.

5 Likes

@dmonad this is really cool!

I hadn’t considered that you might be able to get away with just replacing the entire document, provided you take care to maintain the local selection before/after the transformation like you do here (though you’re right that in many cases, decorations will be lost as part of the application of the ProseMirror transaction). A note that this selection-setting code will result in tr.selectionSet being equal to true for all Y-js originating transactions; this may not be a problem, depending on the use-case, but there’s an undocumented bitfield you might be able to set that’s used in prosemirror-collab to denote that the selection didn’t change (unless it did, via upstream document changes).

In my automerge experiment, I haven’t yet gotten to the point where the schema is enforced across conflicts that lead to a schema-invalid ProseMirror document state. I’m still hoping to cook up some form of “manipulating the state with transformations”, as you described, because I think beyond decorations there are some other benefits (anything that relies on tr.mapping should Just Work), though we’ll have to see if there’s not some other huge roadblock I haven’t yet considered!

Thanks for the tip @saranrapjs. Before every Yjs transaction, the current PM selection is rebased on the structure of the Yjs document so it won’t be affected by remote changes. Yjs based positions fulfill the same use-case as PM position mappings, but they work directly on the Yjs document. Internally they map to the unique id (lamport timestamp) of the character that the selection points to. After the change is applied, the Yjs based position is transformed back to an absolute PM based position that takes into account any remote change that happened. I can’t omit this step, because otherwise PM might destroy the local selection when I replace nodes. Once I figure out how to apply the diff between the old and the new state, I can make use of PM mappings again and this step may not be necessary anymore.

I saw your ticket in the automerge github repo. Enforcing that a CRDT complies to a PM schema might be impossible without an inordinate amount of bookkeeping. But something you could try is to create a view on the data that complies to the schema. For example, if a blockquote does not have any children, ignore it.

2 Likes

Yeah that’s a good idea; my worry is that if it’s still possible to map positions in ProseMirror to array indices in the CRDT more or less 1:1, keeping “tombstone” nodes that don’t conform to the schema would amount to a similarly large amount of bookkeeping.

out of curiosity - what’s the benefit/goal of combining these two structures?

CRDTs, such as Yjs or automerge, are data structures that can be manipulated by many clients concurrently and merged automatically without merge conflicts.

By mapping the ProseMirror state to a CRDT we allow multiple clients to edit the document concurrently - allowing collaborative Google Docs-like editing.

ProseMirror already has a collaboration module: collab. But the collab approach is centralized (it requires a central server that decides in which order document transformations are applied). CRDTs, on the other hand, work in distributed systems and generally work peer-to-peer (e.g. over WebRTC).

Furthermore, collab does not support offline editing, while CRDTs do. Marijn’s blog post gives some valid points against automatic merging of offline edits. He explains that changes that were produced offline should be merged manually to prevent that automatic merges results in documents with “a strange mishmash of text when two people edited the same sentence in different ways”. I believe that automatic merging of offline edits is in practice preferable to manually resolving conflicts. But the user should get visual clues about changes that were produced while offline. Conflicts, that were automatically resolved, could be presented to the user for manual review.

Our goal is to implement an alternative collaboration module that works peer-to-peer and supports offline editing. Let me know if I missed something @saranrapjs.

5 Likes

Potentially relevant to this discussion: Raph Levien describing “why CRDT didn’t work out as well for xi-editor as I had hoped”: https://github.com/xi-editor/xi-editor/issues/1187#issuecomment-491473599

@marijn The title of this post is very misleading for people who don’t know about the XI project. The goal of the XI editor was to put editor components (like language server and syntax highlighting) in separate processes and let them concurrently modify the CRDT. They hoped that this would improve input latency, but explained that a synchronous model is better suited for their use case.

The mentioned post is not about CRDTs for collaboration.

6 Likes