From the docs:
In HTML, a paragraph with markup is represented as a tree, like this:
…
Whereas in ProseMirror, the inline content is modeled as a flat sequence, with the markup attached as metadata to the nodes:
…
This more closely matches the way we tend to think about and work with such text. It allows us to represent positions in a paragraph using a character offset rather than a path in a tree, and makes it easier to perform operations like splitting or changing the style of the content without performing awkward tree manipulation.
This also means each document has one valid representation. Adjacent text nodes with the same set of marks are always combined together, and empty text nodes are not allowed. The order in which marks appear is specified by the schema.
I’ve been writing an implementation which parses Prosemirror’s data structure and lets me do transformations etc before spitting out React elements. (If there’s some well-maintained library for this already which I’ve missed, please do point it out.)
When making input to test it with, I found that a structure which is common in my content causes problems. That structure is longish link text which has some spans (“marks”) in it, such as bold, italic, or (more commonly in my content) monospaced font (“code” tag). For example, a link such as “see the whateverFunction
documentation”.
The data structure which Prosemirror gives is three bits of text. All three have the link mark, and the middle one has the link mark and the code mark.
This then gets output as three separate links. This is a bad thing – it means there are three separate hit zones, three tab stops.
I have worked around this, by looking for strings of text nodes which share a common link mark, and wrapping them in a custom node type. It’s pretty hairy code and I wish it wasn’t necessary.
I suppose I have a couple of purposes here:
- Discuss the data structure. For my money, “this more closely matches the way we tend to think about and work with such text” is incorrect in the case of links. It’s never “here’s a link, then here’s a monospace link, then here’s a link, and they all happen to go to the same place”, it’s always “here’s a link, of which this part is monospace”. I wonder if a wrapper node for such cases (I called it
linkedText
) would be a good change for Prosemirror to adopt. - Find out if there’s an existing Prosemirror → React implementation which allow me to customize the output (i.e. substitute in my own components for particular elements), which works around this issue. This is really hard to search, because there are so many projects and discussions about using the Prosemirror editor with React. I have no interest in doing that.