Sibling text nodes

bradleyayers · March 3, 2017, 5:39am

I’m working on writing a JSON Schema to describe the shape of documents that we’ll be storing and sharing between systems. We have a code block node, based on schema-basic’s:

  code_block: {
    content: "text*",
    group: "block",
    code: true,
    defining: true,
    parseDOM: [{tag: "pre", preserveWhitespace: true}],
    toDOM() { return ["pre", ["code", 0]] }
  },

I’m particularly interested in why content is defined as text* rather than just text.

In another thread about marks, it was stated that a goal of the document was to be normalized — i.e. have a single representation:

In my mind text* seems to deviate from this goal, as it’s possible to have multiple representations of the same content:

For example:

{
  "type": "text",
  "text": "foo"
},
{
  "type": "text",
  "text": "bar"
}

and

{
  "type": "text",
  "text": "foobar"
}

Is there a reason to not just use text instead of text*?

marijn · March 3, 2017, 8:39am

To allow the block to be empty. In this case it could also have been text{0,1}, but with textblocks that allow marks and non-text inline nodes, you’d need the * to allow breaking the content into pieces.

All document-manipulation functions, as well as Fragment.fromArray, make sure to merge adjacent identically-styled text nodes, so no, you should never end up with a non-canonical encoding of a stretch of text, unless you go out of your way to construct it yourself, by constructing a non-canonical JSON object and deserializing it for example.

bradleyayers · March 3, 2017, 8:59pm

That all makes sense — great info!

Another question comes to mind:

How is a code block with an empty text node conceptually different to a code block with no text node (from the perspective of normalisation)?

edit: actually I have a suspicion it’s not possible to have empty text nodes?

marijn · March 4, 2017, 8:21pm

Yes, empty text nodes are verboten. The library won’t construct them, and you get an error if you try to manually create one.

elgow · March 13, 2017, 5:40pm

I’ve been living mostly in the JSON side of ProseMirror for my project. From that view a “result” node containing an empty text node looks like this:

{“type”:“result”,“content”:[{“text”:"",“type”:“text”}]}

while one with no text node looks like:

{“type”:“result”}

The first is illegal and loading of the JSON into a PM state using schema.nodeFromJSON() will fail.

Since the schema looks like

result: { content: “text?”, …

the second is legal and the HTML in the PM editor becomes ‘<p><br></p>’