Current state of the art on syncing data to backend

ssaric · February 1, 2023, 8:31pm

Hello everyone!

I’ve been in prosemirror ecosystem for about 3 months now. I am just starting to get the hang of the library, but one problem remains for me.

I am still torn on how to sync my data to backend. I have nothing fancy on FE, mostly a regular PM editor, but the trick is documents can be very big. Like, 10Mb JSONs. (P.S. Prosemirror handles those pretty well, actually impressive)

So sending entire JSON to backend is not good. I’d rather just send diffs. Sending Steps occurred to me, but I have no way of applying this to my backend. Backend is written in .NET core and we are keeping documents in MongoDB. Not very wiggle room there. I have a REST architecture at my disposal and MongoDB sitting behind backend.

What are my options here?

I am currently calculating diffs via changedDescendants method from prosemirror-tables and getting nodes that are updated. I am debouncing the save function, and after 1s of no input, I calculate the diff between old and target doc:

dispatchTransaction: (tr) => {
        const oldState = editorView.state;
        const newState = editorView.state.apply(tr);
        editorView.updateState(newState);
        if (tr.docChanged) {
          if (!docReference.current) {
            // this is our reference state
            docReference.current = oldState.doc;
          }
          // and after a while, this function will fire that will compare 
          // newState.doc and docReference.current
          debouncedFunction(newState.doc);
          onChange?.(editorView.state.doc.toJSON());
        }
      },

So I am sending these diffs to backend, with nodeIds, and then updating part of the tree. But this gets complicated really fast, and its not a good long term solution. I am feeling like I am working against the Prosemirror model somehow.

Any help is appreciated.

Thanks

marijn · February 2, 2023, 10:01am

Many setups do run JavaScript (or some port of prosemirror-transform to another language—but I don’t think there’s one for .NET) and communicate steps between the client and server using those.

But I haven’t really heard of people working with 10mb documents—is there something like images-as-data-urls or something in your documents that makes them so big? In most setups, sending the documents back and forth is unproblematic, because they generally are not that big.

I guess you could try something like JSON diffing to only send a small patch over, if you really need to.

ssaric · February 2, 2023, 10:13am

Hi marijin!

Nope, they are just large patent description documents. I’ve specifically mandated that we keep images as URLs for the same reason you mentioned.

JSON diff is something I am trying today, so thanks for the tip!

ssaric · February 13, 2023, 12:02pm

Update: For anyone looking for a solution with this, we have successfully implemented a POC that is consisting of:

Frontend:

We initialize a “reference state”. While user is doing stuff (typing editing etc.), we are debouncing a callback.
After actions stop, a callback function is called passing current state of the editor.
We then compare the reference state with the last captured state and calculate a JSON diff via this library and generate JSON patch array.
We clear the reference state so the next batch of edits can be properly calculated.

The code implementation roughly looks like this:

...
dispatchTransaction: (tr) => {
        const oldState = editorView.state;
        const newState = editorView.state.apply(tr);
        editorView.updateState(newState);
        if (!docReference.current) {
          docReference.current = oldState.doc;
        }
        if (tr.docChanged) {
          debouncedFunction(newState.doc);
        }
      },
...
 const debouncedFunction = debounce(async (newDoc: Node) => {
    const oldDoc = docReference.current;
    if (!oldDoc) {
      return;
    }
    const newJson = newDoc.toJSON();
    const oldJson = oldDoc.toJSON();
    docReference.current = null;

    const patches = await calculatePatches(oldJson, newJson);
    try {
      if (patches) {
        await patch({  patches });
      }

    } catch (error) {
        // Handle errors
    }
  }, 1000);

Note: For calculating patches, we are using a Web Worker, so whatever the size of the document is, its not heavy on the main thread. With the aforementioned library in mind, we tested it on 600 page long document, and the diffs are very fast, the library probably optimizes to properly diff big JSONs.

Backend: Since we are doing JSON patches, most of backends support these patches, we then just patch the JSON and save it. For our particular setup, we translate the patches to MongoDB methods.

For an extra layer of security, we are thinking of sending a hash of state to backend, so the backend can compare it after applying the patch. If hashes don’t match, a reconciliation step should occur, we are still tinkering with this.

Thanks for this awesome library, hopefully somebody will find this thread useful!

marijn · February 13, 2023, 1:02pm

How do you pass the editor state to that worker? As far as I’m aware, that involves serializing and deserializing the whole thing.

ssaric · February 13, 2023, 1:04pm

    const newJson = newDoc.toJSON();
    const oldJson = oldDoc.toJSON();
    docReference.current = null;

    const patches = await calculatePatches(oldJson, newJson);

You can see here, I convert the state to JSON, and pass both JSONs to method. The method calculatePatches is doing all the heavy lifting. It sends the two JSONs to worker and gets the result back.

Edit: I just did a test on these few lines of code. Including waiting for worker to respond, conversion toJSON etc:

    const startTime = performance.now();
    const newJson = newDoc.toJSON();
    const oldJson = oldDoc.toJSON();
    docReference.current = null;

    const patches = await calculatePatches(oldJson, newJson);
    const endTime = performance.now();
    console.log(`Call to calculate took ${endTime - startTime} milliseconds`)

Call to calculate took 171.2999999988824 milliseconds

I did a test on a JSON that is 4.17 MB big.