How to traverse the node tree?

ccorcos · November 18, 2020, 9:00am

I’m using ProseMirror to represent a tree-like structure.

const doc: NodeSpec = {
	content: "block+",
}

const paragraph: NodeSpec = {
	content: "inline*",
	group: "block paragraph",
}

const branch: NodeSpec = {
	content: "paragraph block*",
	group: "block",
}

const text: NodeSpec = {
	group: "inline",
}

I want to create a keymap for the Tab key which will indent the current paragraph into a new branch. For now, I’m assuming the text selection is just a $cursor.

Basically, what I want to do is this:

If the current block has a previous sibling
  If the previous sibling is a branch
    Remove the block from its current parent and append it block to the end of the branch.
  Else if the previous sibling is a paragraph
    Replace the current block and previous sibling with a branch that has both of these blocks as children.

I’m having trouble understanding how I should write this logic: What is the recommended way to traverse the tree?

I can get the current block with $cursor.parent but I can’t get the next parent up using $cursor.parent.parent. Instead I have to use $cursor.node($cursor.depth - 1). I suppose that works, but now I need to get the previous sibling. My expectation would be to use something like $cursor.parent.parent[$cursor.parent.parentOffset - 1], but that’s not currently possible. Perhaps there is a better way to think about how this is supposed to work? It seems like once a function returns a ProsemirrorNode then I’ve reached a dead-end.

Thanks for the help!

Chet

ccorcos · November 18, 2020, 9:15am

Alright, to my surprise I was able to come up with something that works.

Tab: (state, dispatch) => {
	if (state.selection instanceof TextSelection) {
		const { $cursor } = state.selection as TextSelection
		if ($cursor) {
			// Get the previous node.
			const { node, index, offset } = $cursor
				.node($cursor.depth - 1)
				.childBefore($cursor.index($cursor.depth - 1))
			if (!node) {
				return false
			}
			// If the node is a paragraph, then lets wrap in a branch.
			if (node.type === state.schema.nodes.paragraph) {
				const range = $cursor.blockRange(
					state.doc.resolve($cursor.before($cursor.depth))
				)
				if (range) {
					if (dispatch) {
						dispatch(
							state.tr.wrap(range, [
								{ type: state.schema.nodes.branch },
							])
						)
					}
					return true
				}
			}
		}
	}
}

However, something that doesn’t feel right in that the way I determined if the previous parent is a paragraph is entirely unrelated to how I constructed the range that I want to wrap. This seems really error-prone. Any suggestions on a better way of doing this?

marijn · November 18, 2020, 10:50am

.childBefore($cursor.index($cursor.depth - 1))

This passes a child index to a method that expects an offset count. I think .maybeChild($cursor.index($cursor.depth - 1) - 1) is what you want here.

Why does the previous child have to be a paragraph if what you’re wrapping is the node with the cursor in it? (At least, that’s what the call to blockRange seems to be selecting.)

ccorcos · November 18, 2020, 5:33pm

I’m wrapping the node with the cursor and the one before.

- paragraph
- paragraph|
- paragraph

Suppose | is the cursor and I hit Tab. The final state should be:

- branch
  - paragraph
  - paragraph|
- paragraph

That’s because I’ve explicitly defined branch as its own node:

const branch: NodeSpec = {
	content: "paragraph block*",
	group: "block",
}

But ideally it would be more like:

- paragraph
  - paragraph|
- paragraph

With the paragraph content as something like this where it has text content as well as block children that are indented.

const paragraph: NodeSpec = {
	content: "inline* block*",
	group: "block paragraph",
}

Basically, I’m trying to build something similar to Notion’s editor where indenting blocks doesn’t just display them as indented, but also represents them as children in a tree structure:

ccorcos · November 18, 2020, 10:40pm

Ok. I think this is what I want:

const { $from, $to } = state.selection
const $before = state.doc.resolve(
	$from.posAtIndex(
		$from.index($from.depth - 1) - 1,
		$from.depth - 1
	)
)
const $after = state.doc.resolve($to.after())
const range = $before.blockRange($after)
if (range) {
	if (
		range.$from.nodeBefore?.type === state.schema.nodes.paragraph
	) {
		if (dispatch) {
			dispatch(
				state.tr.wrap(range, [
					{ type: state.schema.nodes.branch },
				])
			)
		}
		return true
	}
}

bZichett · November 28, 2020, 4:35pm

@ccorcos Why not utilize all the hard work @marijn did for list items and just display them on the UI as indented (hide the list symbol or whatever styling you need)? The collection of nested list wrappers (ordered_list or bullet_list which are implemented nearly identically) and list_items is a tree.

You seem to just be calling the “list wrapper” a branch? Or is there a more nuanced use case you are aiming for that I missed? Do you not want that extra list_item wrapping paragraphs?

ccorcos · November 28, 2020, 8:01pm

Yeah, the difference is subtle.

{
  type: "list-item", 
  text: "Subaru outback"
  content: [{
    type: "paragraph"
    text: "Has all-wheel drive"
  }]
}

vs

{
  type: "list-item", 
  content: [
  {
    type: "paragraph"
    text: "Subaru outback"
  },
  {
    type: "paragraph"
    text: "Has all-wheel drive"
  }]
}

In the first example, the “branch” of the tree has it’s own text as well as children. Whereas in @marijn’s implementation, only leaves have text.

When it comes to data modeling, I really want the first data structure because the second paragraph is hierarchically a child of the first one in some very subjective sense of how humans interpret documents. But also, this is how Notion and Roam work.

And I’m seeing now that the reason for his approach is because it models the DOM more closely which is a good thing when it comes to designing a library.

I think I’ve come up with a solution which is to parse through the Prosemirror Node tree and translate between these two data structures when I persist the data.

bZichett · November 28, 2020, 9:31pm

Ok after reviewing your needs, the list wrapper with one paragraph child and then a nested list wrapper as the second child seems to be essentially what you want?

I do indeed recognize the difference in final data structure, but it seems to me they can be rendered/used/managed/handled/queried quite similarly.

Btw I mean this in a supportive way as a developer to another developer: Claiming Roam or Notion does it one way is not a good defense of the architecture. It should be defended in some way other than some other solution architect / designer / developer chose to do it this way. That’s just my opinion though. Although it does seem to me that ProseMirror can do what they do (I have done it, possibly going beyond) with a slightly different tree design. So I ask, why implement new architecture and create a totally separate implementation where all the commands in prosemirror-schema-list would accomplish what you need? What exactly are you gaining? (Open ended question. I’d to like to hear a more detailed answer from you)

ccorcos · November 29, 2020, 12:13am

Btw I mean this in a supportive way as a developer to another developer

All good! I’m enjoying the conversation

I do indeed recognize the difference in final data structure, but it seems to me they can be rendered/used/managed/handled/queried quite similarly.

There are two different perspectives I can imagine here.

The goal is to be able to construct a document with certain features and appearance and so we want to find any data-structure that represents that well.
The goal is to be able to build a data-structure with the convenience of a familiar text-editor interface.

What exactly are you gaining?

So I think most people tend to be in the first camp where the final product is the document in its visual form. In that sense, you’re right, why not just use prosemirror-schema-list… you can work with that data structure to write queries and other kinds of things.

But for me, the goal is something a little closed to end-user programming – the goal is to actually construct a specific data structure because I want to be able to write graph queries across this data structure.

bZichett · November 29, 2020, 1:58pm

But for me, the goal is something a little closed to end-user programming – the goal is to actually construct a specific data structure because I want to be able to write graph queries across this data structure.

Ah, here is where we get to whether or not the “list wrapper” versus “branch” architecture might deviate from one another when attempting to query it. My claim was that graph queries would be basically the same.

So, on thinking about this a few minutes, I’d like to talk through the polymorphism of the two solutions, with a slight risk of being redundant - but I’ll take it.

When writing graph queries with the branch solution, the text node of interest is at the top level node. With the list wrapper solution, instead of top level node and given proper schema constraints, it is now in a specific location: the first child of the list wrapper. The extra complexity is that there needs to be a simple utility which asks for the text node at this location when required - it needs to check for a paragraph/text in the first child position of a list wrapper, rather than check for the existence of text in the “upper” branch.

I am not yet seeing why there would be much difference when writing graph queries. Unless, you are hoping to directly feed the proposed Branch JSON or Document instance into some vendor library? I somewhat doubt you are doing it that way and will need at least some data transformation / massaging before running some graph query on it. And if you are writing some custom implementation, then the altered location of the Branch text node to the first child should not matter? (Feel free to elaborate on the type of graph query you thinking of, but overall I expect this minor difference to be easily cast-able in either solution, especially with some parsing stage)

All that being said, I only decided to get involved because I would not like to see a fellow dev work on some solution (or spend too much time) when some other perfectly good solution is already implemented via the prosemirror-schema-list. If you are comfortable getting up and running with the branch solution and can work it out in a reasonable time schedule, power to you.

And of course, let me know if I missed anything with what your already stated requirements were. This might be even more nuanced that I currently think?

ccorcos · November 29, 2020, 8:20pm

If you’re doing this graph query in-memory, then you’re right, we can just write some code to traverse the tree however we want. But if you’re persisting this data to disk and want to be able to query it efficiently, then you’re going to want the data model to be properly represented on disk in the manner you want to query it.

Thus in SQLite, we can have a block table and write a recursive query to find all children fairly easily and this can be quite performant with a couple indexes.

select * from block as b1
where b1.title = 'Subaru Outback'
inner join block as b2
where b2.parent = b1.id

But if we used the flat structure, this query is going to be a little more complicated.

select * from block as b1
where b1.title = 'Subaru Outback'
inner join block as b2
where b2.parent = b1.parent
and b2.id != b1.id

The problem with the above query, though, is that we also need to have some assertion that b1 is the first child of its parent. The above query is looking for sibling of ‘Subaru Outback’, not children.

select * from block as b1
where b1.title = 'Subaru Outback'
inner join block as b2
inner join block as b3
where b2.parent = b1.parent
and b2.id != b1.id
and b3.id = b1.parent
and b3.firstChild = b1.id

I think that’s the equivalent query. Not only is that more gross, it’s an additional join which is going to be less performant. This also assumes we’re modeling the blocks as a linked list, but its totally possible that we’d use fractional indexing to maintain the order of children for collaborative editing which is going to involve an index scan for min the minimum index instead of a simple fistChild equality statement.

This whole explanation assumed SQLite just because that’s familiar to most people, but as soon as you enter the world of graph databases / RDF / datalog, then the differences in these queries is even more painful and less performant to index. Here’s some more background if you’re interested. Let me know what you think

bZichett · December 14, 2020, 8:39pm

Makes sense. Thank you very much for taking the time to explain!