Sibling text nodes

I’m working on writing a JSON Schema to describe the shape of documents that we’ll be storing and sharing between systems. We have a code block node, based on schema-basic’s:

  code_block: {
    content: "text*",
    group: "block",
    code: true,
    defining: true,
    parseDOM: [{tag: "pre", preserveWhitespace: true}],
    toDOM() { return ["pre", ["code", 0]] }
  },

I’m particularly interested in why content is defined as text* rather than just text.

In another thread about marks, it was stated that a goal of the document was to be normalized — i.e. have a single representation:

In my mind text* seems to deviate from this goal, as it’s possible to have multiple representations of the same content:

For example:

{
  "type": "text",
  "text": "foo"
},
{
  "type": "text",
  "text": "bar"
}

and

{
  "type": "text",
  "text": "foobar"
}

Is there a reason to not just use text instead of text*?

1 Like

To allow the block to be empty. In this case it could also have been text{0,1}, but with textblocks that allow marks and non-text inline nodes, you’d need the * to allow breaking the content into pieces.

All document-manipulation functions, as well as Fragment.fromArray, make sure to merge adjacent identically-styled text nodes, so no, you should never end up with a non-canonical encoding of a stretch of text, unless you go out of your way to construct it yourself, by constructing a non-canonical JSON object and deserializing it for example.

That all makes sense — great info!

Another question comes to mind:

How is a code block with an empty text node conceptually different to a code block with no text node (from the perspective of normalisation)?

edit: actually I have a suspicion it’s not possible to have empty text nodes?

Yes, empty text nodes are verboten. The library won’t construct them, and you get an error if you try to manually create one.

I’ve been living mostly in the JSON side of ProseMirror for my project. From that view a “result” node containing an empty text node looks like this:

{“type”:“result”,“content”:[{“text”:"",“type”:“text”}]}

while one with no text node looks like:

{“type”:“result”}

The first is illegal and loading of the JSON into a PM state using schema.nodeFromJSON() will fail.

Since the schema looks like

result: { content: “text?”, …

the second is legal and the HTML in the PM editor becomes ‘<p><br></p>’