How we went about prosemirror-collab at the New York Times

saranrapjs · August 1, 2019, 10:47pm

My colleague Sophia and I just wrote up some of our team’s experience building a collab ProseMirror editor that we now use in article production at The New York Times:

…we’d be happy to talk here about any of the more technical bits from this for a ProseMirror enthusiast audience

marijn · August 2, 2019, 5:39am

Nice! Thanks for the shout-out.

mikeb · August 2, 2019, 4:41pm

I’ve built most of the realtime collab layer for our PM implementation. I’d certainly be interested in learning more about whether you did any work to sync documents before saves, or if you used rollups/checkpoints at all. We do, because our documents are very-long-lived, and being absolutely certain that stepwise edits and rolled-up documents are compatible has proven non-trivial.

saranrapjs · August 2, 2019, 10:40pm

being absolutely certain that stepwise edits and rolled-up documents are compatible has proven non-trivial

We ran into this exact set of problems in a previous, pre-collab implementation of step storage for showing a version history; it was a nightmare! When we began building collaborative editing, we chose to treat steps as the source of truth for the state of the document, and the history of “rolled-up documents” as a materialized view of steps. While we no longer run into synchronization issues between steps and point-in-time documents, it’s been helpful to design the point-in-time documents with the assumption that we could blow them away and re-index them at some point if need be.

As far as how we solved the sync issue itself: any database with transactions support should guarantee that updates made to the “rolled-up documents” are current and consistent with any step insertions that may have happened while an update to a document is in progress. We also chose to use the recent-most rolled-up document (+ any not-yet “harvested” steps) as the starting point when loading a collaborative editor, which further solidified this philosophy: the few errors we had early on with out-of-sync documents were quickly ironed out because they otherwise blocked loading the collaborative editor altogether.

mikeb · August 6, 2019, 3:44pm

Two more questions:

How much editing goes into your PM docs? Do you add rich content directly? Or are you doing mostly simple text editing?
Do you do all “validity” checks at the client level? Or do you send steps to an application server running PM and do saves at that level?

saranrapjs · August 6, 2019, 11:44pm

How much editing goes into your PM docs? Do you add rich content directly? Or are you doing mostly simple text editing?

It’s very much a rich text affair — lots of leaf nodes w/ somewhat complex schema shapes.

Do you do all “validity” checks at the client level? Or do you send steps to an application server running PM and do saves at that level?

I’m not sure what you mean by validity, but Firestore is a client-facing database, so the steps are presumed to be valid by the time they are inserted into the database. They are effectively double-checked because there are server-side processes that consume steps and apply them to a shared/persisted document as well, but the insertion is determined by the client-side code.

mikeb · August 7, 2019, 2:02pm

I was asking about checkpoint/save validity. You mentioned that you’ve resolved your issues with steps vs persisted documents, and I was wondering if that was done at the server level or the client level. For various reasons we chose to do our saves from the client layer. It sounds like you’ve done things the other way, with your persistence done by a server layer.

I’m not sure it actually changes the picture that much, but as we’re still seeing occasional issues I figured I’d ask.

michaeldfallen · August 22, 2019, 4:33pm

Any chance you could talk about how you represent the other users cursors in the editor?

We (the FT) are currently building that “collaborative cursor” functionality and are finding that using a widget Decoration for the cursor and an inline Decoration for any text selection has caused a few confusing things to happen in the browser like the browser cursor jumping around as decorations are moved around the document by prosemirror.

saranrapjs · August 28, 2019, 1:42am

We (the FT) are currently building that “collaborative cursor” functionality and are finding that using a widget Decoration for the cursor and an inline Decoration for any text selection

We do precisely this; under the hood, these are backed by data that’s serialized/deserialized as real, ProseMirror Selections. This allows us to take advantage of the fact that a Selection conforms to the Mappable interface, which allows us to try to keep it up to date a little easier (or optimistically update it using local/not-yet confirmed steps).

We did find that there are some CSS gotchas when decorations for an empty cursor get placed in certain kinds of elements…I’m not sure that I have a list of these off-hand, but they resulted in “looks weird” rather than “totally incorrect” remote cursor decorations.

matej-svejda · August 29, 2019, 2:36pm

Really cool article Could you maybe go into a bit more detail on how you store the steps in Firestore? I useed Firestore as a PM collab backend for a project, but the performance with which the steps were pushed to clients was slower than I’d have liked. Also there are limitations imposed by Firestore, like the maximum write rate to a document of 1 second.

mskr · August 31, 2019, 2:06pm

If you are comfortable with a rather untechnical question and have any way of evaluating this yet, could you share how happy your users are compared to the non-realtime system from before?

saranrapjs · September 2, 2019, 12:30pm

the performance with which the steps were pushed to clients was slower than I’d have liked

My understanding is that writes to Firestore are slower than, say, Firebase (their previous gen), because every write is persisted to multiple regions before ack’ing the write to the writer (e.g. there’s an emphasis on durability over latencies). I’m not sure how this impacts how quickly that ack is distributed to connected clients, but my impression has been that it’s this initial, multi-region piece that might impact perceived throughput. Because of the varied network conditions that some of our users face, we decided that this throughput was fine for us; it’s worth keeping in mind that the situation where you’re comparing the speed at which steps arrive with two tabs on the same computer is not a real world scenario

Also there are limitations imposed by Firestore, like the maximum write rate to a document of 1 second We’ve never run into the maximum write throughput being an issue; the client-side code needs to be written to be retry/failure-tolerant anyways, so even if this was happening commonly (& again, I’ve never personally seen it in our logs), it’s not likely the users would really notice a difference.

saranrapjs · September 2, 2019, 12:36pm

As far as I can tell: happy! Caveat that I work closely w/ folks whose job it is to check in and make sure we’re actually making people happy, so my impression is sort of impressionistic.

Collaborative editing totally eliminates a class of coordination problems that used to exist, where multiple people had to coordinate out-of-band who controlled writes to the document. There’s a separate challenge that comes from opening up access to the document in realtime (we forget that it can be kind of scary to know that someone can see you type in realtime, if you’ve never composed this way before). But because our editor tries to conform to the norms of other, popular collaborative editors, it means that it’s more familiar to anyone who takes that functionality as a given (rather than the kind of complex technical undertaking that it is!)

ppiety · September 22, 2020, 11:16am

Hi, I am considering using ProseMirror for an educational research project where we want to be able to have a group of four students work on a collaborative writing task and then collect data on who made what contributions and when. ProseMirror looks to have the capabilities we need, but also to have a non-trivial implementation process and learning curve and so I would appreciate any advice on whether it is easy enough for a small research project or if we should look for other options. I can be reached at ppiety@umd.edu if you care to share your insights. Gratefully, Phil Piety

astevenson · September 22, 2020, 2:27pm

If you have a moment to post it there, this would make a great top level thread. I think you’ll get a good variety of answers and they’ll be useful to other folks in your position in the future.