Prosemirror Collab Performance : prosemirror-collab-commit

ProTip · July 25, 2023, 10:40pm

Hi All,

I have written about about performance findings with ProseMirror collab during larger (20+ active client) collab sessions here: ProseMirror Collab Performance.

As well we have released a new commit-based plugin expanding on @benaubin previous work. More info in the link above!

GitHub stepwisehq/prosemirror-collab-commit
Npm @stepwisehq/prosemirror-collab-commit

Cheers!

marijn · July 26, 2023, 5:48am

How can you guarantee convergence with this approach? Do clients roll back and re-apply their own commits when they are accepted? (ProseMirror’s pseudo-OT does not guarantee that documents will converge if you apply steps in a different order.)

ProTip · July 26, 2023, 6:57am

Do clients roll back and re-apply their own commits when they are accepted?

Yes exactly; when appropriate.

Here is the relevant excerpt from the Google Wave paper:

…Wave OT modifies the basic theory of OT by requiring the client to wait for acknowledgement from the server before sending more operations. When a server acknowledges a client’s operation, it means the server has transformed the client’s operation, applied it to the server’s copy of the wavelet and broadcast the transformed operation to all other connected clients. Whilst the client is waiting for the acknowledgement, it caches operations produced locally and sends them in bulk later.

With the addition of acknowledgements, a client can infer the server’s OT path. We call this the inferred server path. By having this, the client can send operations to the server that are always on the server’s OT path.

This has the important benefit that the server only needs to have a single state space, which is the history of operations it has applied. When it receives a client’s operation, it only needs to transform the operation against the operation history, apply the transformed operation, and then broadcast it.

So in prosemirror-collab-commit if a client sends a local commit based on confirmed document v1, one of two scenarios occurs:

I. The server is still on v1
It applies the steps and broadcasts the confirmed commit.

The client receives the confirmed commit that matches its unique commit ref and expected next version. Since it applied the same steps to the same version the server did it closes out its in-flight commit and prepares a new one.

II. The server is no longer on version v1
It maps the client’s commit through the newer commits and assigns it v3(or whatever the next version is).

The client will not apply commits out-of-order. It will receive the confirmed commits and rebase its unconfirmed steps on them. Eventually it will receive confirmation of its in-flight commit, close it out, and prepare a new commit.

marijn · August 4, 2023, 9:05am

And this is the exact issue people hit early on with prosemirror-collab performance;

Interesting. I’ve never heard of this from any of the various people I’m working with who have been using prosemirror-collab for years. Is this really something that came up for you, outside of synthetic benchmarks? Because it seems to require A) constant typing without pause, and B) high latencies.

This is not to say this isn’t an issue—I can see how it could theoretically happen—but I’m wondering if it is a common enough issue to complicate the protocol for.

ProTip · August 4, 2023, 2:59pm

I worded that in a confusing way. What I meant to convey is that the issue in the analogy is the exact issue with the algorithm, as the number of concurrent edits scale up to large numbers, this is likely the cause of the first noticeable performance issues.

Is this really something that came up for you

Yes, there were a lot of use cases involving 20-40 active editors on a single document. These included document centric activities during remote meetings for brainstorming, planning, and various weekly rituals involving the entire engineering team or company. A typical session would involve a 5ish minute time-boxed period when everyone is contributing to the document at the same time. After, editing would stop and there would be read-outs and discussions.

Our team was globally distributed with members in Europe, USA, South America, and even South Korea. The latency variation between team members was quite high and WFH exacerbated this due to dodgy network conditions at homes, AirBnBs, coffee shops, tethering, and coworking spaces.

The network client was heavily instrumented through Sentry and FullStory so we were able to track unconfirmed steps, step confirmation latency, and etc. This data was reconciled with the expected behavior based on the algorithm “model”(which I feel is pretty sound). One of our team members in Europe would reliably have their edits rejected for minutes during activities haha.

These issues were one of the initial motivating factors in exploring Yjs(IIRC).

but I’m wondering if it is a common enough issue to complicate the protocol for

I think it’s just very use case specific. Most ProseMirror projects uhh… probably never end up with super heavy workloads like the above. For those that do or get to the scale of Atlassian or Zoho it may be worth it

Since I wrote that plugin and have implemented backends based on it in NodeJS and Asp.Net, it doesn’t seem very complicated to me and it’s “easy” enough that I would just default to it now.

marijn · August 5, 2023, 10:04am

Did you explore a solution where clients enforce some minimum time distance between pushing their local changes (causing them to batch them in bigger groups)? That seems like it might solve this issue with less engineering effort.

ProTip · August 5, 2023, 8:08pm

Yes, we were already debouncing before initial step submit and the batches would grow the longer a client went without having steps confirmed(I like to call this being out in the cold).

The powers that be were also interested in a collaborative editing experience at least on-par with the major players for those number of active clients and beyond. The “chunkiness” of updates seems to play a big role in shaping collab editing UX. TMK delays won’t really address fairness either unless client latencies are factored in, or some sort of turn-taking is attempted.

The commit-based algorithm chunks but it scales naturally based on how long the server takes to respond to a client. Either due to client latency, or the incoming commits per second surpassing what can be processed on a single document. The number of discrete updates that can be applied per second is also much higher, and the extra work(network traffic, CPU, etc) from retries is all but eliminated.

marijn · August 7, 2023, 9:10am

Indeed, that makes sense, that this would allow greater throughput. I guess the code for applying changes to the client doesn’t look all that different, it just can make a few less assumptions about how its own changes are going to come back. Did you have any trouble integrating this style with the undo history, or did that just work, when you set the rebased metadata?

ProTip · August 8, 2023, 3:21pm

The history seems to work fine, and the deep undo tests pass.

When I was testing @benaubin 's plugin I ran into some weird issues so I decided to make a very straightforward implementation using prosemirror-collab as a template. Since then I’ve come to believe it was due to it not setting mirrors during the server-side rebasing, but prosemirror-collab-commit has inherited the history and selection handling from prosemirror-collab

etclub · August 9, 2023, 9:43am

I am wondering after exploring Yjs, what did you find and why you developed your plugin instead of using Yjs directly.

I am going to add collab function for my editor. And your insights will help me a lot before I take my decision.

ProTip · August 13, 2023, 1:38am

I wrote about this more here on HN. Here are the bullets from that comment:

State-based CRDT isn’t great when you want a central authority in the mix anyway and are fundamentally trying to work with operations.
The exchange rate between ProseMirror’s currency, steps, and some other replication strategies building blocks is too high.
ProseMirror should add the concept of range-relocation to its mappings; this is a bit of an aside but it would help retain user intent when reconciling concurrent edits involved in block relocations.

For my use cases it’s just simpler to just work with processing ProseMirror steps.