Custom empty element (used as pagebreak, visual tab)

mroeling · September 9, 2020, 2:30pm

Case: Custom element used as visual page break element in the editor. The element needs to be an inline element. It will be shown as block in the editor, but should be inserted as inline element inside the content. Similar (and perhaps better as inline example) is a visual tab character. (green bar is active paragraph in editor)

Current: By ProseMirror inside a <p> element the custom inline element is added as expected.

<p>content1<pagebreak></pagebreak>content2</p>

When the content is saved, the XMLDocument is serialized using new XMLSerializer().serializeToString(content); to

<p>content1<pagebreak/>content2</p>

which is still fine XML.

Problem:

When this content is passed back to the editor to be parsed by the schema/ Dom parser, the self-closed element is “eating” the content right after it:

<p>
    content1
    <pagebreak>content2</pagebreak>
</p>

The schema:

// NOTE: although the id and data-id are handled in this schema but not in the example, 
// I've tested both with and without this extra functionality.
export const pageBreak: NodeSpec = {
    attrs: {
        id: {default: null}
    },
    inline: true,
    content: "inline*",
    group: "inline",
    atom: true,
    parseDOM: [
        {
            tag: "pagebreak",
            getAttrs(dom) {
                let attrs = {};

                if (dom instanceof HTMLElement) {
                    attrs["id"] = dom.getAttribute("data-id");
                }
                return attrs;
            }
        },
    ],
    toDOM(node) {
        return ["pagebreak", {"data-id": node.attrs["id"]}];
    }
};

Is it possible to use this kind of empty elements in ProseMirror?
What is the reason the empty element “eats” the text right after the custom element? It’s just like it does not allow this element to be empty, so use the next content untill a well-known element (like textnode or </p>) is encountered.

Extreme example

Exported content

<p>content1<pagebreak></pagebreak><pagebreak></pagebreak><pagebreak></pagebreak>content2</p>

Is parsed into the editor as

<p>
    content1
    <pagebreak>
        <pagebreak>
            <pagebreak>
                content2
            </pagebreak>
        </pagebreak>
    </pagebreak>
</p>

marijn · September 9, 2020, 6:06pm

At a glance, it looks like you’re parsing XML as HTML at some point in your process (HTML does not have self-closing elements).

mroeling · September 10, 2020, 6:55am

The closed tag <p>content1<pagebreak></pagebreak>content2</p> was generated by passing the ProseMirror XML Document through the XMLSerializer. Only other solution I found is by using the innerHTML property, but that also exports with self-closing elements. Is there another way to extract the text from the XMLDocument, without the self-closing elements?

marijn · September 10, 2020, 7:37am

It most certainly does not.

mroeling · September 10, 2020, 7:48am

A, sorry, I stand corrected. The innerHTML problem was the other side of the problem: