Different Default Block for Parsing vs Editing

jordandh · August 9, 2024, 11:27pm

When ProseMirror’s DOMParser parses the dom it automatically inserts context nodes to make a valid document. ProseMirror will also pick default block nodes for some commands. I’d like to use different default block nodes for parsing and editing. In my case I’d like to keep paragraphs as the default block node which I have now based on my schema. But when parsing I want text and inline nodes to be wrapped in a different block node.

I don’t see a way to do this with the ProseMirror API. I’ve been trying different things like using different schemas for parsing and editing but the docs produced by both do not seem directly compatible with each other. I’ve also tried modifying the node order on the schema before and after parsing but that doesn’t work either (perhaps due to the context matching being cached.) That’s also getting hackier than I like.

Is there a way to get the desired behavior?

prosed · August 9, 2024, 11:57pm

I’m not really following your request, but if you’re asking if you can parse HTML and then serialize it to a different format and parse it back in identically, I think the answer is no.

jordandh · August 10, 2024, 12:16am

That’s not what I’m trying to do here. I’m parsing html → dom → prosemirror doc. The dom → prosemirror doc conversion uses ProseMirror’s DOMParser which inserts new nodes to make the document valid. I want a div inserted as the context node when its parsed. Then that prosemirror doc is loaded in the editor. When the user makes an edit such as hitting enter at the end of a paragraph I want a paragraph to be inserted by the splitBlock command and not a div.

Right now ProseMirror uses the same logic to pick the inserted node which means it picks a paragraph in both cases.

heyainsleymae · August 10, 2024, 7:24am

You’ll need to create a custom command that performs the desired behaviour and bind it to Enter; see this similar question with that exact solution: Handle enter press but with different node type

Your re-ordering of the schema’s nodes is right on track with what is described in the library guide, specifically the penultimate paragraph of the Content Expressions section:

The order in which your nodes appear in an or-expression is significant. When creating a default instance for a non-optional node, for example to make sure a document still conforms to the schema after a replace step[,] the first type in the expression will be used. If that is a group, the first type in the group (determined by the order in which the group’s members appear in your nodes map) is used.

Re-ordering the schema so <div> elements get inserted during parsing will give you half of the behaviour, and binding a custom command to Enter should give you the other half.

jordandh · August 10, 2024, 4:21pm

You are right that I can make a custom command. But there are lots of places ProseMirror uses this default node behavior and rewriting all of them seems like a bad approach.

heyainsleymae · August 10, 2024, 10:28pm

Using different schemas for parsing and editing seems like the way to go, then, unless you can get by with only overriding the behaviour in a few places.

When you say:

Where are the incompatibilities coming from? Your desired behaviour—wrapping all inline nodes with the desired node type—is pretty broad, so I can’t imagine the documents produced by both schemas would differ in many ways other than the ones you want. If you mean that the documents also differ in undesirable ways, maybe adding the context property to the wrapping node’s TagParseRules would help?

jordandh · August 10, 2024, 10:42pm

I don’t know exactly why it’s not working but applying a transaction with nodes from a different schema were resulting in no changes being done by the transaction. I think I’ll play around with it a bit more. At the very least I know the node specs for nodes differ because they reference a different schema and likely the content matching stuff is different because of the different node order.

heyainsleymae · August 11, 2024, 4:47pm

Try using the <div>-first schema when parsing the initial document, and then passing a DOMParser made from the <p>-first schema as the editor view’s domParser property.

jordandh · August 12, 2024, 4:56pm

I’ve gotten it to work by re-ordering the nodes before and after the parse. The missing piece was rebuilding the content matches.

rebuildContentMatches() {
    let contentExprCache = Object.create(null);

    for (let prop in this.nodes) {
      const type = this.nodes[prop];
      const contentExpr = type.spec.content ?? '';

      type.contentMatch = contentExprCache[contentExpr] || (contentExprCache[contentExpr] = (ContentMatch as any).parse(contentExpr, this.nodes));
    }
  }

...

const originalNodes = schema.nodes;
const parsingContainerNode = originalNodes.parsingContainer;
let newNodes = {
  parsingContainer: parsingContainerNode,
  ...omit(originalNodes, ['parsingContainer'])
};
schema.nodes = newNodes;
schema.rebuildContentMatches();

const pmDoc = pmDOMParser.fromSchema(schema).parse(xmlDoc);
schema.nodes = originalNodes;
schema.rebuildContentMatches();

The Schema constructor builds the content matches. I just copied out the bits needed for this. I’m still not happy about this solution since its a bit hacky but I’ll likely use it if I don’t find another way.