we have built an xml editor using Prosemirror, and now we are trying to do a roundtrip of loading xml → editing → saving it back as XML.
A question that came up was how to return the edited xml - we found two approaches from the previous discussions:
xml → build the schema → load the xml → edit → state.toJson() → serialize → xml
xml → build the schema → load the xml → edit → DOMSerializer → serialize → xml
The question is which method of the two is recommended state.toJson() or DOMSerializer , has anyone gone through this before?
Or is there a better approach that we haven’t considered yet?
It sounds like a custom XML serializer would be easier to build on top of the JSON (or even the direct Node/Fragment) representation than the DOM serializer’s output.
I am working with @plutonik, so let me try to clarify some points in her question…
We haven’t really built an XML Editor with ProseMirror… Our input to out application is XML (XDITA) which we transform into JSON (JDITA). ProseMirror has no awareness of our XML. We provide to ProseMirror two things:
a JSON document that is structured to our own specification called JDITA.
a ProseMirror Schema that tells ProseMirror which of our JSON (i.e. JDITA) entities are Blocks, Groups, or Marks.
From the JSON (JDITA) document and the ProseMirror Schema, ProseMirror is able to correctly render and edit our document.
As the ProseMirror schema seems to give ProseMirror a mapping from our JSON document to ProseMirror’s data model for the purpose of rendering and editing, we are wondering if ProseMirror is able to use the same schema in the opposite direction, i.e. after editing, we want to get back a JSON (JDITA) document from ProseMirror. Is this possible?
I’m not sure how a ProseMirror schema is converting your JSON data to ProseMirror’s document objects. If you want to serialize to a custom format, you’re going to have to write a serializer yourself.
Hi @marijn when you say “I’m not sure how a ProseMirror schema is converting your JSON data to ProseMirror’s document objects”, either I don’t understand what you mean, or it makes me nervous.
We are using ProseMirror’s own API to do the conversion from JSON (JDITA) and ProseMirror Schema to (presumably) ProseMirror’s document objects. An sample of our code:
import { Node } from "prosemirror-model";
import { EditorState } from "prosemirror-state";
import { EditorView } from "prosemirror-view";
import { history } from "prosemirror-history";
const domEl = document.querySelector("#editor") as HTMLElement;
const doc = Node.fromJSON(schemaObject, jsonDoc);
const state = EditorState.create({
doc,
plugins: [
// history plugin comes from prosemirror-history
history(),
shortcuts(schemaObject),
menu(schemaObject, {
end: [[]],
start: [[]],
}),
]
})
// create a new EditorView with the DOM element and the state
new EditorView(domEl, {
state,
});
Here’s an example of the problem. Say you load this content in your editor:
<p><i>Emphasized and <b>strongly emphasized</b> content</i>.</p>
Then you want to serialize it back from Prosemirror to a file.
With the simplest approach, you would get something like this:
<p><i>Emphasized and </i><i><b>strongly emphasized</b></i><i> content</i>.</p>
or
<p><i>Emphasized and </i><b><i>strongly emphasized</i></b><i> content</i>.</p>
Getting back the initial structure of inline tags is possible, but rather complex.
Even more complex is representing something like this: <i>Emphasis and <i>nested</i> emphasis</i>, first in Prosemirror, and then back with a serializer.
That’s because a Mark can be applied or not applied to a text node, but it can’t be applied twice.
You may differentiate it with an attribute (e.g. a depth integer) and set excludes="" in the MarkSpec, so that two emphasis Marks are considered different when their attributes are different, although they share the same type name.
@marijn I wonder if I have missed something, or I am confusing you… As I understand it ProseMirror can operate on ANY JSON document, as long as you give it a ProseMirror Schema describing the JSON format. If that is the case, then I would explain that JDITA is our expression of a document language (LwDITA - Lightweight DITA) into JSON. Does that make more sense?