Getting a hold of the `document` used by ProseMirror from `toDOM()`

There are two givens:

  1. ProseMirror takes a document parameter to serializeNode(). (It seems to be optional, but I believe it’s required in SSR.)
  2. ProseMirror allows toDOM() to return a DOM node (a result of document.createElement()).

My assumption is that somewhere in the calling context of toDOM() a document is always present—whether it was passed explicitly in (1) or be it ProseMirror using a global document because it runs in a browser.

Thus the question: wouldn’t it make sense to wire it up so that toDOM()’s body can get a hold of whichever document ProseMirror obtained during DOM serialization (or, more precisely, the createElement() that comes with it)? It seems suboptimal that if toDOM() wants to createElement() it has to come up with its own way of accessing a document that was already passed to PM.

Example:

During SSR my build invokes serializeNode() outside of browser environment. It rolls with (1) and passes it a document stub from JSDOM. JSDOM is a beast I don’t want to lug around at all, but it works (I recall trying some alternatives, but they did not work for PM).

Simultaneously, there are a couple of times where my schema implementation outputs a DOM element in toDOM(). To make those occasions work, I have to either A) (in SSR) patch through a global document where I define schema spec, or B) find a creative way of creating schemas on the fly from functions that I pass an explicit document (either JSDOM in SSR, or a global in browser).

I currently do (A), but it feels wrong.

What I have read:

Relevant, but not helping:

On the one hand, yes, having this would be a little cleaner. On the other hand, this is a browser library, and running it outside of that context is not a very important use case. I suppose you’re already running some of your tests in the browser. If I were starting from scratch I’d probably also run the prosemirror-model tests in a headless browser (as you already mention, JSDOM isn’t perfect). As such, I don’t really feel comfortable adding complexity to the library for this specific situation, and as you found, it’s not hard to work around this.

JSDOM is not perfect

Actually, it seems to be pretty close in terms of feature completeness. The problem is size. If anything, “what is the minimum I need to give ProseMirror for it to work” is the big question. Surely it’s not using the entire API surface of browser’s Document… If document object(s) need to be passed around while generating the site, it would at least help to pass strictly what’s needed.

I can see how it would make PM itself more testable, too (in addition to helping some users not lug around a massive dependency, if they need to build in Node or say in browser’s worker environment that has no access to DOM).

If one was to contribute at least on the documentation front (maybe even on narrowing down typing signatures) what’d be the best way to go about it?

That really doesn’t seem like a reasonable thing to do. You can create Node’s just fine in a DOM-less environment. But you cannot render them. And since you cannot display them either, that seems like it isn’t much of a problem.

You can create Node’s just fine in a DOM-less environment. But you cannot render them. And since you cannot display them either, that seems like it isn’t much of a problem.

The context is rendering static HTML deliverables. PM is called “off the DOM” to convert its JSON content tree to a string. (There are many benefits to using PM here, including stable document appearance to human readers and editors, single DRY code path for obtaining content representation while maintaining fast initial page opening times, graceful degradation without JS [sans node views], etc.)

This part seemed obvious enough that I didn’t explain it in the original post—I must be suffering from a bit of tunnel vision.

If I’m missing some way to accomplish this without a document global and without having to pass document to PM, I’d love to know what it is…

  1. Getting a hold of DOM serializer’s document in toDOM() seems to involve:

    • serializeNodeInner() (pass document to node spec’s toDOM()),
    • serializeNode() (pass document to serializeMark()),
    • serializeMark() (pass document to mark spec’s toDOM()), and
    • updating node/mark spec’s toDOM() TypeScript signature (backwards compatible).

    For example, in case of serializeNodeInner(), just this call (lines split for legibility) would change from

    renderSpec(
      doc(options),
      this.nodes[node.type.name](node),
      null,
      node.attrs)
    

    to

    const dom = doc(options)
    renderSpec(
      dom,
      this.nodes[node.type.name](node, dom),
      null,
      node.attrs)
    
  2. Regarding what document entails, for ProseMirror’s purposes (meaning both what callers must pass to serializeNode() and what it’d pass to spec’s toDOM() per (1) above) it seems to be a tiny subset of Document with only the following:

    • .createTextNode() (called here)
    • .createDocumentFragment() (called here)
    • .createElement[NS]() (called here, and presumably in a toDOM() that wishes to return a DOM node)

    Concerning the HTMLElement or DocumentFragment returned from the above functions, for PM’s purposes it seems they only must implement:

    • .nodeType property, only applicable to HTMLElement
    • .setAttribute[NS]() (called here), only applicable to HTMLElement
    • .appendChild() (called here and here)

    This is based on prosemirror-model’s to_dom.ts, let me know if there’re other modules of concern.

    Narrowing down the type can be done easily with TypeScript’s built-in Pick type:

    type BareDocument = Pick<Document,
      | "createTextNode"
      | "createDocumentFragment"
      | ...>;
    

Notes;

  • The above changes are backward compatible and don’t affect existing users.

  • The changes are independent of each other, though I would say it’d be better to narrow down Document first, if it’s to be added to toDOM()’s signature.

  • In fact, instead of giving toDOM() access to document, even a narrowed-down version, it could be worth passing just an element constructor function:

    toDOM(
      node: Node,
      createDOMElement: BareDocument["createElement"],
    ): ...
    

Aren’t you much better served with the array-style DOMOutputSpec for this situation, since almost anything you can do on a DOM node that you cannot do in such an array will be lost when serializing to HTML text again?

Running DOM creation code outside of the browser just seems like a messy approach. It blurs the lines of what capabilities a piece of code has—authors could be tempted, for example, to try and hook into other client-side APIs in this code.

This situation is kind of on me, for designing the output type of this method this way. But I don’t want to add further oddness to the interface for this use case (I really hope pulling in JSDOM for server-side rendering isn’t something a lot of people are doing). Given that, if you’re committed to that approach, it is just as easy to use your own mechanism to smuggle a reference to a Document instance into your methods, I don’t want to add extra parameters to the library for this.