Best approach to save documents from Prosemirror?

Hi all,

we have built an xml editor using Prosemirror, and now we are trying to do a roundtrip of loading xml → editing → saving it back as XML.

A question that came up was how to return the edited xml - we found two approaches from the previous discussions:

  1. xml → build the schema → load the xml → edit → state.toJson() → serialize → xml
  2. xml → build the schema → load the xml → edit → DOMSerializer → serialize → xml

The question is which method of the two is recommended state.toJson() or DOMSerializer , has anyone gone through this before? Or is there a better approach that we haven’t considered yet?

You can check our project in https://github.com/evolvedbinary/prosemirror-jdita.

Thank you!

It sounds like a custom XML serializer would be easier to build on top of the JSON (or even the direct Node/Fragment) representation than the DOM serializer’s output.

I am working with @plutonik, so let me try to clarify some points in her question…

We haven’t really built an XML Editor with ProseMirror… Our input to out application is XML (XDITA) which we transform into JSON (JDITA). ProseMirror has no awareness of our XML. We provide to ProseMirror two things:

  1. a JSON document that is structured to our own specification called JDITA.
  2. a ProseMirror Schema that tells ProseMirror which of our JSON (i.e. JDITA) entities are Blocks, Groups, or Marks.

From the JSON (JDITA) document and the ProseMirror Schema, ProseMirror is able to correctly render and edit our document.

As the ProseMirror schema seems to give ProseMirror a mapping from our JSON document to ProseMirror’s data model for the purpose of rendering and editing, we are wondering if ProseMirror is able to use the same schema in the opposite direction, i.e. after editing, we want to get back a JSON (JDITA) document from ProseMirror. Is this possible?

I’m not sure how a ProseMirror schema is converting your JSON data to ProseMirror’s document objects. If you want to serialize to a custom format, you’re going to have to write a serializer yourself.

Hi @marijn when you say “I’m not sure how a ProseMirror schema is converting your JSON data to ProseMirror’s document objects”, either I don’t understand what you mean, or it makes me nervous.

We are using ProseMirror’s own API to do the conversion from JSON (JDITA) and ProseMirror Schema to (presumably) ProseMirror’s document objects. An sample of our code:

import { Node } from "prosemirror-model";
import { EditorState } from "prosemirror-state";
import { EditorView } from "prosemirror-view";
import { history } from "prosemirror-history";

const domEl = document.querySelector("#editor") as HTMLElement;

const doc = Node.fromJSON(schemaObject, jsonDoc);

const state = EditorState.create({
  doc,
  plugins: [
    // history plugin comes from prosemirror-history
    history(),
    shortcuts(schemaObject),
    menu(schemaObject, {
      end: [[]],
      start: [[]],
    }),
  ]
})

// create a new EditorView with the DOM element and the state
new EditorView(domEl, {
  state,
});

Does that make sense?

I had to solve a similar problem.

I found the back-translation of Marks to be the most difficult part, because of their flat structure.

I’ve seen you have a few Marks.

Here’s an example of the problem. Say you load this content in your editor:

<p><i>Emphasized and <b>strongly emphasized</b> content</i>.</p>

Then you want to serialize it back from Prosemirror to a file.

With the simplest approach, you would get something like this:

<p><i>Emphasized and </i><i><b>strongly emphasized</b></i><i> content</i>.</p>

or

<p><i>Emphasized and </i><b><i>strongly emphasized</i></b><i> content</i>.</p>

Getting back the initial structure of inline tags is possible, but rather complex.

Even more complex is representing something like this: <i>Emphasis and <i>nested</i> emphasis</i>, first in Prosemirror, and then back with a serializer.

That’s because a Mark can be applied or not applied to a text node, but it can’t be applied twice.

You may differentiate it with an attribute (e.g. a depth integer) and set excludes="" in the MarkSpec, so that two emphasis Marks are considered different when their attributes are different, although they share the same type name.

But the problem gets even more complex.

What you’re doing here is loading a regular ProseMirror JSON document. I’m not sure what JDITA is in this.

@marijn I wonder if I have missed something, or I am confusing you… As I understand it ProseMirror can operate on ANY JSON document, as long as you give it a ProseMirror Schema describing the JSON format. If that is the case, then I would explain that JDITA is our expression of a document language (LwDITA - Lightweight DITA) into JSON. Does that make more sense?

No, that is not at all how it works. ProseMirror defines one specific JSON format that it uses to serialize/deserialize its documents.

@marjin thanks for the clarification and your patience. I was indeed mistaken!

I have now discovered that we have a transformation function (prosemirror-jdita/prosemirror-jdita/src/document.ts at feature/import-and-save-file · evolvedbinary/prosemirror-jdita · GitHub) that takes our JDITA JSON document, and transforms it into a ProseMirror JSON document before we call Node.fromJSON(schemaObject, jsonDoc).

So our workflow currently looks like:

  1. XML (XDITA) → JSON (JDITA) → JSON (ProseMirror)

We then edit the document in ProseMirror.

So I think then to save our document from ProseMirror we need to:

  1. Call ProseMirror’s state.doc which gives us the updated ProseMirror document
  2. Write our own serializer function that takes the ProseMirror document to convert it back to our JSON (JDITA) format.

Do I understand that correctly now?

Yes, that should work.

Thanks very much for your time @marijn and @massi