Nodes as more complex Objects?

Thanks! But 3 issues here:

  1. It does not solve the <div class="header"> issue - see long version. In short, the toDOM is a tree structure - I cannot break that down into fragments unless each DOM element become a node by it self - and then it would get really crazy…

Also, a few things I dumped into doing this as:

"big_object": <NodeSpec> {
	group: "block",
	content: "big_child_1 big_child_2",
	toDOM: (node: Node) => ["div", { class: "big-object" }, 0],
	parseDOM: [{ tag: "big-object", }]
},
"big_child_1": {
	content: "inline+",
	toDOM: (node: Node) => ["div", { class: "big-child-1" }, 0],
	parseDOM: [{ tag: "big-child-1", }]
},
// ... same for 2

… with this content passed to the parser:

<big-object>
	<p>whUt?</p>
</big-object>

… will match the p into big_child_1?!

  1. So for above - how to make the schema really “strict”? I can understand p element does not yield and error unless I maybe create my own group (hope it then does?), but right now the it makes the p become a big_child_1 - that’s a bit odd.

  2. Also the inline+ for the big_child_1 object - that will insert a “dino” (i added dino from sample before img) into the inline content even though the dino is not the first inline object declared. How is the default forced content picked?

Not sure why that would be crazy.

Anyway, I still don’t really see the problem.

The library will look at the nodes in the inline group, in the order they are declared in the schema, and pick the first one that isn’t a text node and doesn’t have required attributes.

  1. Ok thanks, the problem is that then the header/dummy element (see <div class="header"> from org. question) as node then becomes a part of the data model… tried with some parser rules like context but no dice:

I added and changed this:

doc: {
	content: "(block | big)+ "
},
"big_object": <NodeSpec> {
	group: "big",
	content: "big_dummy",
	toDOM: (node: Node) => ["div", { class: "big-object" }, 0],
	parseDOM: [{ tag: "big-object", }]
},
"big_dummy": <NodeSpec> {
	group: "big",
	content: "big_child_1 big_child_2",
	toDOM: (node: Node) => ["div", { class: "big-dummy" }, ["b", "Literal Text"], ["span", 0] ],
},

… but that then makes the children of big-object match nothing - because the it get “rejected” by the “dummy DOM as a node”:

<big-object>
	<p>whUt?</p>
	<big-child-2>Ok!</big-child-2>
</big-object>

The header/dummy fragment should not be data, parse or schema nor the toJSON etc. as it’s really just the wysiwug html ui thing…

Any samples out there of someone that has done something similar? This object stuff gets to hackies for its original intend maybe…

  1. Also - is it possible to catch mismatched content during parse? Does not seems so from the parser class doc. I really need this to be very strict - like follow the schema or yield error / remove data.

Are you trying to use some fixed XML/HTML format? If so, that’s not always possible—ProseMirror needs a recognizeable node for each node type for its internal DOM representation, and you’ll have to make sure those exist.

I think you may want to write your own parser, rather than rely on DOMParser to parse your external format, and use a different, more explicit DOM serialization inside the editor.

yes - thanks!, good idea - write my own parser (xml to pm model) + serializer for the revert (pm model to xml) - then the dummy nodes should not be “in the way”… + use nodeViews for more advance in-control rendering rather than toDOM

Hey Marijn,

I’m curious if this is the sort of use case a decoration widget might also be useful for?

Probably not—decoration’s aren’t part of the document itself, so I don’t really see how they can help with schema modeling.

I was looking at the original post and thinking that the title and artist attributes are really attrs on the album node. (Also maybe true of the desc now that andersmad is writing their own parser and serializer). Where we have similar attrs, we mostly edit those via modal dialog right now which is a bit high-friction and would love to find another way to represent them in the UI, I’ve assumed the Widget system is one way to do so though I ran into an issue with using them on inline nodes the first time I tried using them.

the thing is that I got the properties/attributes too for a ton of other stuff in a non-modal window (I got observers so changing them will change the rendering too) - but i’d really like some of it to be content editable in the normal “flow”… that also means that I’ll need to extend the parser - not rewrite it as my schema will sit on top of exiting.

a few question here:

  1. in from_dom.js @ addElement it goes if (rule && rule.skip.nodeType) dom = rule.skip - when does bool skip becomes an element? a wee bit hard for me to read non-typescript code - a bit of a guessing game.
  2. the order of the nodes seems to matter when using “content” in the schema - see sample below.
  3. should the from_dom not use the group? see P bellow - it gets a match even though its another group.

schema extending the basic:

doc: { content: "(block|x_grp)+ " },

"x_object": <NodeSpec> {
	group: "x_grp",
	content: "x_dummy",
	toDOM: (node: Node) => ["div", { class: "x-object" }, 0],
	parseDOM: [{ tag: "object", }]
},

"x_dummy": <NodeSpec> {
	group: "x_grp",
	content: "x_child_1 x_child_2",
	toDOM: (node: Node) => ["div", { class: "x-dummy" }, ["b", "Stuff"], ["span", 0] ],
	parseDOM: [{ tag: "dummy", skip: true }]
},

"x_child_1": <NodeSpec> {
	group: "x_grp",
	content: "inline*",
	toDOM: (node: Node) => ["div", { class: "x-child-1" }, 0],
	parseDOM: [{ tag: "child-1", }]
},

"x_child_2": <NodeSpec> {
	group: "x_grp",
	content: "inline*",
	toDOM: (node: Node) => ["div", { class: "x-child-2" }, 0],
	parseDOM: [{ tag: "child-2", }]
},

then this (order - child 2 then 1 does not compute):

<object>
	<p>whUt?</p>
	<child-2>Ok 2</child-2>
	<child-1>Ok 1</child-1>
</object>

becomes: image

and this (bad match? P becomes a part of the group etc.):

<object>
	<child-1>Ok 1</child-1>
	<p>whUt?</p>
	<child-2>Ok 2</child-2>
</object>

becomes: image

it seems like its very close to being able to handle all this - theres just a few caveats - or logic that I don’t get.

NOTE: ignore this one - see: Nodes as more complex Objects?

and here’s another one - I added this to the basic schema to have am <image src="..> element to match in conjunction with the the std img node already there (this is for special image type) so here the very basic:

"x_image": <NodeSpec>{
	toDOM: (node: Node) => ["img", { src : "/img/variable.svg" }],
	parseDOM: [{ tag: "image" }]
}

source for parser:

<image src="..."  />

but - that will always the std. img tag!? whuut? so it ignore the parserDOM now - or it uses the output from the toDOM or waz’dat?

I’m not sure your example with x_dummy needs it.

You can use an extended parseDOM rule to include a contentElement function that picks out the dummy tag

parseDOM: [
  {
    tag: "heading",
    getAttrs (dom) { return headingAttrs(dom, dom.dataset.level) },
    contentElement (dom) {
      return dom.getElementsByClassName('content')[0]
    }
  },

Here’s an example from our custom heading node which has a node view that adds buttons and such around the ‘content’ element that are not part of the node proper.

Regarding that image node: I think you’d need a second parseDOM rule and you’d need your custom x_image node to show up before the regular image one. Alternatively if you never have any other kind of image you could remove the original image node from the base-schema before adding your own.

Something like:

"x_image": <NodeSpec>{
	toDOM: (node: Node) => ["img", { class: "special-image", src: node.attrs.src }],
	parseDOM: [{ tag: "image", getAttrs() {...} }, {tag: "image.special-image", getAttrs() {...} }]
}

ok, ignore the image thing - it turns out that Chrome translates image to img :frowning: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/image

let a = document.createElement("div"); 
a.innerHTML = "<image src='.' />"; 
console.log(a.innerHTML) // shows <img src=".">

I guess if you’re writing your own parser you won’t need the parse rules anymore. I was confused by this bit of the manual.

I’m not going to write a new parser - maybe extend current - but first I need to make sense of the existing one and why elements get matched in wrong schema nodes and why it bleeds content into other elements etc… I need to know from @marijn if this is by design or a bug… if by design - I’m gonna make some kind of “strict” rule property that makes it only match if group (or other “tag” or using the context) is also a match… and maybe a rule “lookup” property that makes the content way ignore the order of the nodes…

I, very hackishly, manage to remove the content bleeding and to do match un-ordered elements… I’d still very much like to hear any “official” comment on these topics - thanks…

I don’t really know what you’re talking about here. Concrete examples, with the schema nodes that are matched and that you expected to match, would be helpful.

hi, that will be this post: Nodes as more complex Objects?

need more details etc.?

You can set it to a node to skip through to some inner node, ignoring the outer structure.

The order of nodes in the schema is significant, yes.

It is not clear to me what you’re expecting, but no, groups don’t seem to be relevant here.

If you have a predictable, structured input format, writing your own parser is going to be much more productive than trying to make the parser for arbitrary DOM input do what you want.

ok, I’m just a bit concern about future schema changing and then “old” content getting dumped into wrong elements as my sample shows: the content with the two children does not allow for any P element, but it gets added to the x_child_2… shouldn’t it look at the parent content description to see what is allowed? and maybe also the group? because both “rules” are enforced while editing.

sure, but the idea is to mix yours with my stuff - but i see the main issue is that in my system schema is the ruler where as it in yours/prosemirror is the data - and my gut feeling tells me - that if i do a strict schema parsing i will get to far away from the prosemirror basics, scaffold and community addition etc.

so instead of having an object system with content blocks (mine as is) - the goal was mixing it up totally (content and complex object mixed up in tree) - but I can see now that might be to complex with data being more donimating than schema in prosemirror… might still use it to kick standard html editor(s) and add some of the objects like rows/cell, smart images etc.