Link text with some bold/italic/code in it

bjn · May 4, 2022, 8:56pm

In HTML, a paragraph with markup is represented as a tree, like this:

…

Whereas in ProseMirror, the inline content is modeled as a flat sequence, with the markup attached as metadata to the nodes:

…

This more closely matches the way we tend to think about and work with such text. It allows us to represent positions in a paragraph using a character offset rather than a path in a tree, and makes it easier to perform operations like splitting or changing the style of the content without performing awkward tree manipulation.

This also means each document has one valid representation. Adjacent text nodes with the same set of marks are always combined together, and empty text nodes are not allowed. The order in which marks appear is specified by the schema.

I’ve been writing an implementation which parses Prosemirror’s data structure and lets me do transformations etc before spitting out React elements. (If there’s some well-maintained library for this already which I’ve missed, please do point it out.)

When making input to test it with, I found that a structure which is common in my content causes problems. That structure is longish link text which has some spans (“marks”) in it, such as bold, italic, or (more commonly in my content) monospaced font (“code” tag). For example, a link such as “see the whateverFunction documentation”.

The data structure which Prosemirror gives is three bits of text. All three have the link mark, and the middle one has the link mark and the code mark.

This then gets output as three separate links. This is a bad thing – it means there are three separate hit zones, three tab stops.

I have worked around this, by looking for strings of text nodes which share a common link mark, and wrapping them in a custom node type. It’s pretty hairy code and I wish it wasn’t necessary.

I suppose I have a couple of purposes here:

Discuss the data structure. For my money, “this more closely matches the way we tend to think about and work with such text” is incorrect in the case of links. It’s never “here’s a link, then here’s a monospace link, then here’s a link, and they all happen to go to the same place”, it’s always “here’s a link, of which this part is monospace”. I wonder if a wrapper node for such cases (I called it linkedText) would be a good change for Prosemirror to adopt.
Find out if there’s an existing Prosemirror → React implementation which allow me to customize the output (i.e. substitute in my own components for particular elements), which works around this issue. This is really hard to search, because there are so many projects and discussions about using the Prosemirror editor with React. I have no interest in doing that.

marijn · May 5, 2022, 6:31am

The way marks are rendered depends on their order in your schema definition. You’ll see that, for example, on https://prosemirror.net, links wrap code spans, because links come first in the basic schema. This sounds like you put code marks before links.

bjn · May 5, 2022, 8:22am

The docs I quoted say that no span wraps any other span. That is the way they are represented in the data model.

marijn · May 5, 2022, 10:18am

Have you tried doing what I suggested?

bjn · May 5, 2022, 7:31pm

I don’t know anything about schema definitions in this context.

I suppose you’re telling me that there is a renderer component which turns the data model back into HTML, and that it does handle the issue I’m highlighting, if configured correctly. I’m not using whatever component this is. Can you point to it? None of the modules listed on ProseMirror Reference manual sound right.

I am consuming the Prosemirror data model which is coming as a JSON dump from a 3rd-party API.

That 3rd-party API also provides it rendered as HTML, which exhibits the same issue (<a href="x">some </a><code><a href="x">monospace</a></code><a href="x"> text</a>), but even if that’s because of misconfiguration, fixing that wouldn’t change the data model I need to consume.

marijn · May 5, 2022, 9:58pm

Both the DOMSerializer in prosemirror-model and the in-editor renderer in prosemirror-view will combine the mark elements for adjacent nodes with shared marks, if they have no higher-precedence marks that differ. If you can’t control the schema and the serializer, I suppose that’s not much help for you, though.

bjn · May 5, 2022, 10:34pm

As I said in my original message, I’ve already implemented a serializer which takes the JSON and outputs React nodes, which is what I need.

One of the questions in that was whether there was existing package which does this, since I don’t much want to maintain the hairy code I wrote, and someone else might have done it better. Your latest reply gave me the keyword I needed to search for it properly, and I came across this: GitHub - BlueMona/prosemirror-react-renderer: An alternative to ProseMirror's DOMSerializer that converts documents into React elements instead of DOM fragments. – it looks like they hit the same issue as I did (see the last caveat on the readme) and didn’t go to the effort. It looks like it’s also abandoned.

Now that you’ve pointed out the built-in DOMSerializer class, I wonder if that can be extended/customized to return things other than native DOM nodes (i.e. React nodes), and I don’t think so, otherwise that’s how the above package would have done things, and a quick scan over the code seems to confirm that.

I suppose that concludes this topic: serializers are expected to deal with merging/nesting adjacent groups of marks, and there’s no currently-maintained serializer project for React.