Rationale for marks

I’m looking into schema design and trying to decide the best way to represent documents in my application. I often need to decide whether to choose a node or a mark, and it’s occurred to me that I’m not really sure what the rationale for marks are, since it seems plausible that you could represent any document using nodes alone.

Having marks as a concept distinct from nodes comes with datastructure complexity burdens, and I wonder whether it wouldn’t be better to just have nodes everywhere, and a set of functions for doing “mark style” operations on documents.

I feel like I’m missing something, keen to hear the use-cases where the concept of a mark is more valuable than representing it as a node.

Emphasis isn’t hierarchical, it is an extra attribute added to a stretch of content, and making it hierarchical (like HTML does) makes many operations on the document much more awkward due to the conceptual mismatch — for example, ensuring that something isn’t emphasized twice, or ensuring that a given document has a single canonical representation, or representing document positions, or splitting a text block. I had a purely hierarchical model at one point, and the code got a lot better when I introduced marks.

1 Like

for example, ensuring that something isn’t emphasized twice

You raise an interesting point. When thinking about marks though I don’t see why it wouldn’t be possible to have a node with mark also have an ancestor with the same mark. It seems like it would only be through node mutation APIs that this type of restriction could be enforced.

However I can imagine that the operations could be simpler with a mark concept rather than just nodes everywhere.

I was wondering the same as @bradleyayers today. In HTML a strong can wrap around all kinds of stuff and applies to all the element within (if possible). For example I can have something like this:

<strong>
This is some text <img src="smiley.png">. This text is
still bold. So is this <a href="foo">link</a>
</strong>

When using the basic schema this would attach the strong mark to the individual text and link elements, making it near impossible to restore the original markup from it. This is somewhat important for me as my goal is to implement a wiki editor where I would do a round trip of wiki syntax -> json -> prosemirror -> json -> wiki syntax and would want to keep the original document as much as possible. For users it would be weird to go from this:

**This is some text {{smiley.png}}. This text is still bold. So is this [[link]]**

to this:

**This is some text **{{smiley.png}}**. This text is still bold. So is this ****[[link]]**

So what I wonder if it would be possible to use nodes instead. Now what would be helpful for that would be able to have a way to define exceptions in the Content Expressions, that way one could make sure that you do not emphasize twice.

Eg. something like this for a strong node:

strong: {
    group: formatting,
    content: 'inline* formatting* ^strong'
}

Am missing something? Is that a bad idea?

I think you may be overestimating this difficulty, since both the DOM serializer and the Markdown serializer do just this. They don’t preserve the nesting order of the original HTML tags, of course, but that is intentional – it means documents are normalized to a single representation, which is on the whole a desirable property.

No. That ship has sailed.

I was under the impression Discussion: Inline nodes with content - #11 by bradleyayers would make this possible?

Replace marks with a tree shape? No, it certainly won’t. Note that my conclusion in that thread is mostly ‘this isn’t going to work’.