Best approach to save documents from Prosemirror?

plutonik · April 18, 2024, 10:29am

Hi all,

we have built an xml editor using Prosemirror, and now we are trying to do a roundtrip of loading xml → editing → saving it back as XML.

A question that came up was how to return the edited xml - we found two approaches from the previous discussions:

xml → build the schema → load the xml → edit → state.toJson() → serialize → xml
xml → build the schema → load the xml → edit → DOMSerializer → serialize → xml

The question is which method of the two is recommended state.toJson() or DOMSerializer , has anyone gone through this before? Or is there a better approach that we haven’t considered yet?

You can check our project in https://github.com/evolvedbinary/prosemirror-jdita.

Thank you!

marijn · April 18, 2024, 11:25am

It sounds like a custom XML serializer would be easier to build on top of the JSON (or even the direct Node/Fragment) representation than the DOM serializer’s output.

adamretter · April 18, 2024, 11:31am

I am working with @plutonik, so let me try to clarify some points in her question…

We haven’t really built an XML Editor with ProseMirror… Our input to out application is XML (XDITA) which we transform into JSON (JDITA). ProseMirror has no awareness of our XML. We provide to ProseMirror two things:

a JSON document that is structured to our own specification called JDITA.
a ProseMirror Schema that tells ProseMirror which of our JSON (i.e. JDITA) entities are Blocks, Groups, or Marks.

From the JSON (JDITA) document and the ProseMirror Schema, ProseMirror is able to correctly render and edit our document.

As the ProseMirror schema seems to give ProseMirror a mapping from our JSON document to ProseMirror’s data model for the purpose of rendering and editing, we are wondering if ProseMirror is able to use the same schema in the opposite direction, i.e. after editing, we want to get back a JSON (JDITA) document from ProseMirror. Is this possible?

marijn · April 18, 2024, 12:15pm

I’m not sure how a ProseMirror schema is converting your JSON data to ProseMirror’s document objects. If you want to serialize to a custom format, you’re going to have to write a serializer yourself.

adamretter · April 18, 2024, 2:47pm

Hi @marijn when you say “I’m not sure how a ProseMirror schema is converting your JSON data to ProseMirror’s document objects”, either I don’t understand what you mean, or it makes me nervous.

We are using ProseMirror’s own API to do the conversion from JSON (JDITA) and ProseMirror Schema to (presumably) ProseMirror’s document objects. An sample of our code:

import { Node } from "prosemirror-model";
import { EditorState } from "prosemirror-state";
import { EditorView } from "prosemirror-view";
import { history } from "prosemirror-history";

const domEl = document.querySelector("#editor") as HTMLElement;

const doc = Node.fromJSON(schemaObject, jsonDoc);

const state = EditorState.create({
  doc,
  plugins: [
    // history plugin comes from prosemirror-history
    history(),
    shortcuts(schemaObject),
    menu(schemaObject, {
      end: [[]],
      start: [[]],
    }),
  ]
})

// create a new EditorView with the DOM element and the state
new EditorView(domEl, {
  state,
});

Does that make sense?

massi · April 18, 2024, 3:04pm

I had to solve a similar problem.

I found the back-translation of Marks to be the most difficult part, because of their flat structure.

I’ve seen you have a few Marks.

Here’s an example of the problem. Say you load this content in your editor:

<p><i>Emphasized and <b>strongly emphasized</b> content</i>.</p>

Then you want to serialize it back from Prosemirror to a file.

With the simplest approach, you would get something like this:

<p><i>Emphasized and </i><i><b>strongly emphasized</b></i><i> content</i>.</p>

or

<p><i>Emphasized and </i><b><i>strongly emphasized</i></b><i> content</i>.</p>

Getting back the initial structure of inline tags is possible, but rather complex.

Even more complex is representing something like this: <i>Emphasis and <i>nested</i> emphasis</i>, first in Prosemirror, and then back with a serializer.

That’s because a Mark can be applied or not applied to a text node, but it can’t be applied twice.

You may differentiate it with an attribute (e.g. a depth integer) and set excludes="" in the MarkSpec, so that two emphasis Marks are considered different when their attributes are different, although they share the same type name.

But the problem gets even more complex.

marijn · April 18, 2024, 3:22pm

What you’re doing here is loading a regular ProseMirror JSON document. I’m not sure what JDITA is in this.

adamretter · April 18, 2024, 3:49pm

@marijn I wonder if I have missed something, or I am confusing you… As I understand it ProseMirror can operate on ANY JSON document, as long as you give it a ProseMirror Schema describing the JSON format. If that is the case, then I would explain that JDITA is our expression of a document language (LwDITA - Lightweight DITA) into JSON. Does that make more sense?

marijn · April 18, 2024, 4:09pm

No, that is not at all how it works. ProseMirror defines one specific JSON format that it uses to serialize/deserialize its documents.

adamretter · April 19, 2024, 3:08pm

@marjin thanks for the clarification and your patience. I was indeed mistaken!

I have now discovered that we have a transformation function (prosemirror-jdita/prosemirror-jdita/src/document.ts at feature/import-and-save-file · evolvedbinary/prosemirror-jdita · GitHub) that takes our JDITA JSON document, and transforms it into a ProseMirror JSON document before we call Node.fromJSON(schemaObject, jsonDoc).

So our workflow currently looks like:

XML (XDITA) → JSON (JDITA) → JSON (ProseMirror)

We then edit the document in ProseMirror.

So I think then to save our document from ProseMirror we need to:

Call ProseMirror’s state.doc which gives us the updated ProseMirror document
Write our own serializer function that takes the ProseMirror document to convert it back to our JSON (JDITA) format.

Do I understand that correctly now?

marijn · April 19, 2024, 3:22pm

Yes, that should work.

adamretter · April 23, 2024, 7:09am

Thanks very much for your time @marijn and @massi