Nodes as more complex Objects?

Short: Can a node have multiple contentDOMs? No - but then how to express an object like e.g. an album with title, description and tracks?

Long: I am looking into transferring the core of my XML based editor to PM (Prosemirror) but I am facing the caveat of PM not having “objects” but only “simple nodes” with a single entry point for children/content. Lets say this is whats in my system:


<album title="Twoism" artist="Board of Canada">
	<desc>Very <b>cool</b> stuff</desc>
		<tracks title="Sixtyniner" />
		<tracks title="Oirectine" />


<div x:list="albums/album">
<div class="album">
	<div class="header">
		<h1><x:literal path="@title" /></h1>
		<p><x:content path="@title" /></p>
	<ol x:list="tracks">
		<li x:template="track" />

Output: … pretty obvious - the template(s) without the x name space etc. - just like Vue etc.

This makes for both a server render engine and a client side editor to edit albums in wysiwyg style with some inline helpers like “+” to insert new albums and tracks (and much much more).

So trying to “transfer” this to PM I bump in to a PM node should be the album and then the PM node needs to map to @title + desc - this will be flat JSON - sure, that I can then map to XML - but the problem is that the PM node can only have 1 content entry point… I tried with nodeView, but no dice. And I can’t break it into pieces as e.g. <div class="header"> is not an object + the PM does not seem to insert child nodes if the are forced 1 elements only.

Is this possible somehow? Am I missing something?

Another maybe more simple HTML’ish sample would be a fieldset node/control - it has 1 top legend and multiple content elements as siblings:

	<legend>The Legend</legend>
	<p>Hello <b>World</b></p>

Give it a fixed set of child nodes, each of which has its own content.

Thanks! But 3 issues here:

  1. It does not solve the <div class="header"> issue - see long version. In short, the toDOM is a tree structure - I cannot break that down into fragments unless each DOM element become a node by it self - and then it would get really crazy…

Also, a few things I dumped into doing this as:

"big_object": <NodeSpec> {
	group: "block",
	content: "big_child_1 big_child_2",
	toDOM: (node: Node) => ["div", { class: "big-object" }, 0],
	parseDOM: [{ tag: "big-object", }]
"big_child_1": {
	content: "inline+",
	toDOM: (node: Node) => ["div", { class: "big-child-1" }, 0],
	parseDOM: [{ tag: "big-child-1", }]
// ... same for 2

… with this content passed to the parser:


… will match the p into big_child_1?!

  1. So for above - how to make the schema really “strict”? I can understand p element does not yield and error unless I maybe create my own group (hope it then does?), but right now the it makes the p become a big_child_1 - that’s a bit odd.

  2. Also the inline+ for the big_child_1 object - that will insert a “dino” (i added dino from sample before img) into the inline content even though the dino is not the first inline object declared. How is the default forced content picked?

Not sure why that would be crazy.

Anyway, I still don’t really see the problem.

The library will look at the nodes in the inline group, in the order they are declared in the schema, and pick the first one that isn’t a text node and doesn’t have required attributes.

  1. Ok thanks, the problem is that then the header/dummy element (see <div class="header"> from org. question) as node then becomes a part of the data model… tried with some parser rules like context but no dice:

I added and changed this:

doc: {
	content: "(block | big)+ "
"big_object": <NodeSpec> {
	group: "big",
	content: "big_dummy",
	toDOM: (node: Node) => ["div", { class: "big-object" }, 0],
	parseDOM: [{ tag: "big-object", }]
"big_dummy": <NodeSpec> {
	group: "big",
	content: "big_child_1 big_child_2",
	toDOM: (node: Node) => ["div", { class: "big-dummy" }, ["b", "Literal Text"], ["span", 0] ],

… but that then makes the children of big-object match nothing - because the it get “rejected” by the “dummy DOM as a node”:


The header/dummy fragment should not be data, parse or schema nor the toJSON etc. as it’s really just the wysiwug html ui thing…

Any samples out there of someone that has done something similar? This object stuff gets to hackies for its original intend maybe…

  1. Also - is it possible to catch mismatched content during parse? Does not seems so from the parser class doc. I really need this to be very strict - like follow the schema or yield error / remove data.

Are you trying to use some fixed XML/HTML format? If so, that’s not always possible—ProseMirror needs a recognizeable node for each node type for its internal DOM representation, and you’ll have to make sure those exist.

I think you may want to write your own parser, rather than rely on DOMParser to parse your external format, and use a different, more explicit DOM serialization inside the editor.

yes - thanks!, good idea - write my own parser (xml to pm model) + serializer for the revert (pm model to xml) - then the dummy nodes should not be “in the way”… + use nodeViews for more advance in-control rendering rather than toDOM

Hey Marijn,

I’m curious if this is the sort of use case a decoration widget might also be useful for?

Probably not—decoration’s aren’t part of the document itself, so I don’t really see how they can help with schema modeling.

I was looking at the original post and thinking that the title and artist attributes are really attrs on the album node. (Also maybe true of the desc now that andersmad is writing their own parser and serializer). Where we have similar attrs, we mostly edit those via modal dialog right now which is a bit high-friction and would love to find another way to represent them in the UI, I’ve assumed the Widget system is one way to do so though I ran into an issue with using them on inline nodes the first time I tried using them.

the thing is that I got the properties/attributes too for a ton of other stuff in a non-modal window (I got observers so changing them will change the rendering too) - but i’d really like some of it to be content editable in the normal “flow”… that also means that I’ll need to extend the parser - not rewrite it as my schema will sit on top of exiting.

a few question here:

  1. in from_dom.js @ addElement it goes if (rule && rule.skip.nodeType) dom = rule.skip - when does bool skip becomes an element? a wee bit hard for me to read non-typescript code - a bit of a guessing game.
  2. the order of the nodes seems to matter when using “content” in the schema - see sample below.
  3. should the from_dom not use the group? see P bellow - it gets a match even though its another group.

schema extending the basic:

doc: { content: "(block|x_grp)+ " },

"x_object": <NodeSpec> {
	group: "x_grp",
	content: "x_dummy",
	toDOM: (node: Node) => ["div", { class: "x-object" }, 0],
	parseDOM: [{ tag: "object", }]

"x_dummy": <NodeSpec> {
	group: "x_grp",
	content: "x_child_1 x_child_2",
	toDOM: (node: Node) => ["div", { class: "x-dummy" }, ["b", "Stuff"], ["span", 0] ],
	parseDOM: [{ tag: "dummy", skip: true }]

"x_child_1": <NodeSpec> {
	group: "x_grp",
	content: "inline*",
	toDOM: (node: Node) => ["div", { class: "x-child-1" }, 0],
	parseDOM: [{ tag: "child-1", }]

"x_child_2": <NodeSpec> {
	group: "x_grp",
	content: "inline*",
	toDOM: (node: Node) => ["div", { class: "x-child-2" }, 0],
	parseDOM: [{ tag: "child-2", }]

then this (order - child 2 then 1 does not compute):

	<child-2>Ok 2</child-2>
	<child-1>Ok 1</child-1>

becomes: image

and this (bad match? P becomes a part of the group etc.):

	<child-1>Ok 1</child-1>
	<child-2>Ok 2</child-2>

becomes: image

it seems like its very close to being able to handle all this - theres just a few caveats - or logic that I don’t get.

NOTE: ignore this one - see: Nodes as more complex Objects?

and here’s another one - I added this to the basic schema to have am <image src="..> element to match in conjunction with the the std img node already there (this is for special image type) so here the very basic:

"x_image": <NodeSpec>{
	toDOM: (node: Node) => ["img", { src : "/img/variable.svg" }],
	parseDOM: [{ tag: "image" }]

source for parser:

<image src="..."  />

but - that will always the std. img tag!? whuut? so it ignore the parserDOM now - or it uses the output from the toDOM or waz’dat?

I’m not sure your example with x_dummy needs it.

You can use an extended parseDOM rule to include a contentElement function that picks out the dummy tag

parseDOM: [
    tag: "heading",
    getAttrs (dom) { return headingAttrs(dom, dom.dataset.level) },
    contentElement (dom) {
      return dom.getElementsByClassName('content')[0]

Here’s an example from our custom heading node which has a node view that adds buttons and such around the ‘content’ element that are not part of the node proper.

Regarding that image node: I think you’d need a second parseDOM rule and you’d need your custom x_image node to show up before the regular image one. Alternatively if you never have any other kind of image you could remove the original image node from the base-schema before adding your own.

Something like:

"x_image": <NodeSpec>{
	toDOM: (node: Node) => ["img", { class: "special-image", src: node.attrs.src }],
	parseDOM: [{ tag: "image", getAttrs() {...} }, {tag: "image.special-image", getAttrs() {...} }]

ok, ignore the image thing - it turns out that Chrome translates image to img :frowning:

let a = document.createElement("div"); 
a.innerHTML = "<image src='.' />"; 
console.log(a.innerHTML) // shows <img src=".">

I guess if you’re writing your own parser you won’t need the parse rules anymore. I was confused by this bit of the manual.

I’m not going to write a new parser - maybe extend current - but first I need to make sense of the existing one and why elements get matched in wrong schema nodes and why it bleeds content into other elements etc… I need to know from @marijn if this is by design or a bug… if by design - I’m gonna make some kind of “strict” rule property that makes it only match if group (or other “tag” or using the context) is also a match… and maybe a rule “lookup” property that makes the content way ignore the order of the nodes…

I, very hackishly, manage to remove the content bleeding and to do match un-ordered elements… I’d still very much like to hear any “official” comment on these topics - thanks…

I don’t really know what you’re talking about here. Concrete examples, with the schema nodes that are matched and that you expected to match, would be helpful.

hi, that will be this post: Nodes as more complex Objects?

need more details etc.?