Offline, Peer-to-Peer, Collaborative Editing using Yjs

dmonad · January 27, 2020, 8:11pm

Hello ProseMirror enthusiasts

I’m the author of a CRDT library Yjs that handles automatic conflict resolution on shared data. I want to share with you a p2p, offline-capable, shared editing demo using ProseMirror I’ve been working on for a couple of months. Visit our website https://yjs.dev in two browsers and observe how the documents sync.

This demo uses prosemirror-example-setup and the Yjs ProseMirror-plugins exported by y-prosemirror.

Here is why this demo is really cool:

Offline-capable

After your first visit to the website, it is available offline. I use service workers to make the resources available offline. The Yjs document that holds the state of the ProseMirror document is persisted using y-indexeddb to the local browser database. All content that is created while offline is synced to the other peers when you reconnect to the internet.

Peer-to-peer (WebRTC)

This demo uses y-webrtc to share document updates directly with other peers without a central instance to handle conflict resolution. I’m pretty brave to use y-webrtc on a public instance. I do this to test the reliability of the webrtc network. At some point I might switch it out in favor of y-websocket, which is much better suited to handle a large number of visitors.

The content will appear to sync instantly. Browser tabs communicate directly with each other (without WebRTC) using broadcastchannels, completely skipping network communication. This is also the technology that makes it possible to sync content between browser tabs while offline.

Versioning support

You can version the state of the document. When you click on a version, the changes are highlighted by the user who created them. Because this is a public instance without real user management, there is only a “local” and a “remote” user (your browser vs. everybody else). When you come back after a while, click on “Changes since last version” to see what happened while you were gone.

Yjs is already used in production by some awesome tools like room.sh and PluxBox. y-prosemirror is still a pretty new addition to Yjs, but I plan to maintain it as part of the Yjs ecosystem.

Challenges:

In order to support thousands of users visiting the website and handle their sync-conflicts, Yjs needs to be able to represent the data very efficiently. There is a short outline here about the data representation techniques I use to make this performant. Compared to other CRDT implementations, Yjs is up to 1000x faster and encodes data 300x smaller than un-optimized CRDTs: https://github.com/dmonad/crdt-benchmarks
Versions are just views on the data. Normally, structs are transformed to tombstones when they are deleted. The demo transforms un-needed structs to tombstones locally until a version is created. Therefore, the data model does not grow unboundedly for all clients. It will only grow if you create versions.
The webrtc connector creates a totally connected mesh network of webrtc connections. It’s not a problem if some of the connections fail, as long as there is a path from every client to every other client (the graph is connected). After a threshold of about ~30 clients, the y-webrtc provider intentionally creates a partially connected mesh network. There is a good chance that all data will still sync between all the clients. But there are no guarantees, therefore I highly recommend other communication protocols when a large number of clients is expected.

Additional resources:

https://github.com/yjs/yjs-demos has code examples for Yjs + ProseMirror / Prosemirror+versioning and for a third-party editor that is built on ProseMirror (Atlaskit).
https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type
ProseMirror + CRDT's? Initial thread on CRDTs in ProseMirror that roughly describes how the y-prosemirror binding works.
https://demos.yjs.dev/prosemirror/prosemirror.html is a simple ProseMirror demo using y-websocket, without versions and offline editing. Use it to inspect network traffic.

marijn · January 27, 2020, 8:58pm

Awesome work. I’m going to read some of the background material later. Very exciting to see a CRDT approach that might actually be efficient!

dmonad · January 27, 2020, 9:01pm

Thanks @marijn

saranrapjs · January 28, 2020, 2:27pm

This is so cool @dmonad ! Congrats on building this

I haven’t dug into the y-prosemirror code in depth yet, but, is there any way with Yjs fragments to model changes that happen to a fragment as steps rather than replacing the whole document? It seems like in the current implementation of the sync plugin, every synced change ends up replacing the whole document (taking care to maintain the local selection state, and managing the undo state with a Yjs specific plugin). Basically what I’m curious about is whether you could replicate ProseMirror data using Yjs, but still write ProseMirror plugins without needing to “know” that Yjs was handling remote state updates (e.g. where you could, for example, rely on transaction steps + step maps to figure out what ranges have changed during a given remote sync).

marijn · January 28, 2020, 2:41pm

Oh, no, that does seem like a sure way to break almost every ProseMirror plugin ever written.

dmonad · January 28, 2020, 3:55pm

Yes, there is. I gave an outline here: ProseMirror + CRDT's? - #8 by dmonad . The idea is to compute the steps based on the diff of the new and the old state of the document.

From my personal experience computing “minimal diffs” is a bit expensive and unnecessary for my use-cases. I’m still a bit indifferent about this feature. And here is why:

y-prosemirror does not simply emit a new state object when the document changes. It preserves object identity, and so far, ProseMirror has been handling this very well. This also works well together with node-views - they won’t be rerendered. ProseMirror automatically figures out what needs to change. There is no performance loss.
It is necessary to replace ProseMirrors Position Mapping with a Yjs based position mapping. Yjs position mapping (relative positions) are markers on the Yjs data model and guarantee that every client will eventually end up with the same position mapping. In peer-to-peer scenarios clients receive document updates in arbitrary order and there are cases when clients would end up with different results if you would use ProseMirrors native position mapping. As you mentioned, I already use Yjs maps to compute selections. In a different project I use Yjs mappings to represent comments. Relative positions are not as nice to use as ProseMirror mappings. Eventually, I would like to provide an API that works similarly well.
As far as I understand, transaction steps are primarily used for calculating position mappings. As I explained, Yjs’s relative positions are better suited in p2p scenarios.

I have still on my radar that I want to make good-effort transaction steps. But again: From a p2p point of view, transactions (especially the order in which transactions are applied) are not as expressive as the Yjs document changes combined with relative positions. Still, I’m looking for ways to combine these two worlds in a way that makes sense.

I’m not denying that it will break some ProseMirror plugins (i.e. plugins that render using decorations and ProseMirror mappings). But I have tested y-prosemirror successfully in Atlaskit and TipTap. For the reasons mentioned above I also needed to replace ProseMirrors history plugin with Yjs-based history plugin (y-undo-plugin).

saranrapjs · January 28, 2020, 7:10pm

But again: From a p2p point of view, transactions (especially the order in which transactions are applied) are not as expressive as the Yjs document changes combined with relative positions. Still, I’m looking for ways to combine these two worlds in a way that makes sense

This makes sense; my question was not because it’s wrong to route around ProseMirror transactions, just that it will complicate plugins that rely on transactions containing valid steps/step maps. I actually think it’s quite clever that this mostly just works, modulo the history plugin + more directly managing the selection!

The main thing that we use, by way of example, that wouldn’t work without some kind of transaction mapping is indeed decorations; the ability to map positions transaction-wise is what allows us to do efficient transaction-wise computations only when needed (e.g. only recalculating a data structure for the parts of the document that have changed). If the tradeoff here is how expensive it is to compute the diffs that happen as part of a Yjs update vs. how full resolution the diffs are, for my part, I’d be happy with marginally lower resolution diffs

marijn · January 28, 2020, 7:45pm

Comparing structure-sharing trees should actually be doable really efficiently (since you can skip all the shared nodes right away). (There was an implementation of this in a very early ProseMirror system that relied on it for its redrawing algorithm, and it wasn’t very complicated.)

dmonad · January 28, 2020, 8:06pm

Thanks @saranrapjs for sharing that use-case. That is a good reason to preserve transaction steps.

I have been looking at some of the projects that compute diffs between states and they didn’t seem suitable. You are right that this should be easily doable by leveraging object identity. I will look into this tomorrow.

disarticulate · January 29, 2020, 4:15am

ive got a green field project, and its pretty seamless how this and tiptap, work together. i even wired up dexiejs with indexeddb and observable and can sync multiple browser windows. mightbe worth looking it with the other communication protocols.

keep up the good work

holtwick · January 29, 2020, 7:21am

I looked into the Yjs implementation and first of all it is awesome! @dmonad got the CRDT technology working stable and also optimized it to avoid a big data footprint. Congratulations.

But what I think is the crucial feature is that it works serverless. No central instance you need to trust and end-to-end encryption is also doable.

Because of that I believe the technology is worth taking a closer look. Even though the existing sync mechanism works great as well, from my understanding it is more difficult to have the single steps getting applied in the right order and the whole history needs to be remembered in case an older version of the document has been used as a starting point for edits. Please correct me, if I’m wrong.

That said I agree with @marijn that it is worth using the Prosemirror State as the basis for synchronization, since this is the heart of the philosophy behind the project and why it is the best solution for rich text editing available.

This is just my personal opinion I wanted to share. Anyway all I see here is exceptional great work on all sides and would love to see development being continued.

jhnsnc · January 29, 2020, 2:15pm

Awesome stuff, @dmonad! It’s great to see these 2 open source frameworks coming together.

I can’t wait to dig into the details

dmonad · January 29, 2020, 6:22pm

Thanks for you kind words @disarticulate @holtwick and @jhnsnc

I do use the prosemirror state. But for me, the question was if it makes sense to use ProseMirror transforms to represent document changes. Currently, I simply replace the document state. This is easier to do for me. I didn’t see any immediate benefit in Transforms, because they are mainly used to calculate change maps and to provide undo-redo functionality. y-prosemirror has an equivalent to change maps and undo functionality, that work better in p2p scenarios. But @saranrapjs brought up a good point for ProseMirro transforms.

Today I started to adapt the code to use ProseMirror transforms instead. So don’t worry, you will get your transforms I also think it makes sense to support existing plugins.

bhl · January 29, 2020, 9:29pm

I’m also using TipTap; did you create an extension with the plugin field with ySyncPlugin(type), yCursorPlugin(), yUndoPlugin()?

disarticulate · January 29, 2020, 10:48pm

So far, I’ve only integrated the ySyncPlugin. With tiptap, I have to first:

create the editor = new Editor from tiptap
vue.$nextTick(() => editor.registerPlugin(ySyncPlugin(type))

There’s some other setup required, like when reloading, as indicated by @dmonad, i editor.clearContent() to remove, then follow the sync examples in the yjs documentation.

My general goal is to have different editor views, like print mode, another other integrated “living” document types.

holtwick · January 30, 2020, 5:46am

Thanks for clarifing and working on the plugin @dmonad.

@bhl I wrote a simple extension, which currently is just tracking updates. Maybe it is a starting point for your TipTap extension:

import { Extension } from 'tiptap'
import { redo, undo, ySyncPlugin, yUndoPlugin } from 'y-prosemirror'
import { keymap } from 'prosemirror-keymap'
import * as Y from 'yjs'

const ydoc = new Y.Doc()

ydoc.on('update', (updateMessage: Uint8Array, origin: any, doc) => {
    console.log('update', updateMessage, origin)
})

// const provider = new WebsocketProvider('wss://demos.yjs.dev', 'prosemirror', ydoc)

const type = ydoc.getXmlFragment('prosemirror')

export default class RealtimeExtension extends Extension {

    get plugins() {
        return [
            ySyncPlugin(type),
            // yCursorPlugin(provider.awareness),
            yUndoPlugin(),
            keymap({
                'Mod-z': undo,
                'Mod-y': redo,
                'Mod-Shift-z': redo
            })
        ]
    }   

}

nevf · February 3, 2020, 7:35am

@dmonad I am working on a new app which uses Prosemirror and has full offline support. The missing piece was providing offline support for Prosemirror. So this is a fantastic and most welcome addition.

The use of Web RTC and peer-to-peer is impressive, however if I understand correctly this relies on at least one PC up and running and accessible over the Internet at all times, in order for other devices to come and go and all instances keep in sync.

I can envisage this becoming a problem in the real world. Instead I’d like to (optionally) see the ability to have a central server which was always up to date as these clients come an go. Ideally all data on this server would be encrypted by a key only the end-users know, therefore maintaining data privacy.

Thoughts?

PS. I have looked at Yjs ages ago, clearly it’s time to revisit. Keep up the great work.

bhl · February 3, 2020, 7:44pm

@nevf Isn’t this solved by websockets? There’s a yjs websocket client and server library. And as mentioned above, there’s a prosemirror demo of those libraries:

Yjs Prosemirror Example is a simple ProseMirror demo using y-websocket, without versions and offline editing. Use it to inspect network traffic.

nevf · February 3, 2020, 9:19pm

@bhl Thanks for that. I did see a mention of a websocket client, but missed that there was a server. I also was under the impression there was only the webrtc implementation - my mistake.

I’ve just tried the websocket demo and it appears as though offline edits aren’t saved in the Browser (indexeddb). So if you close a Tab with websocket demo open, but you are offline then a) When you re-open the Tab you don’t see any content, b) when you go back online any edits you did offline before closing the Tab are lost.

Of course this may well be resolved using the y-indexeddb provider. Any idea?

bhl · February 3, 2020, 9:42pm

Yeah, I think based off http://y-js.org/, you also need a database adapter for persistence while offline. Indexeddb is one way to do that.