Efficiently finding changed nodes

tslocke · December 20, 2021, 10:20am

Plugins often need to scan the entire document for changes. Given that the large majority of changes are likely much smaller than the whole document (e.g. typing) this could be rather inefficient on large documents.

I wanted to share an approach I’ve been using to do this more efficiently. Maybe others can see problems with it, or suggest something simpler. It plays a bit fast and loose with the immutability of nodes.

The idea is to take advantage of the fact that nodes not seen before are always newly created objects. Let’s say I want to disallow empty paragraphs, and I’m doing that with an appendTransaction plugin. I scan the document looking for empty paragraphs and removing them, but as I do this I add some benign marker property to the nodes (all of them), e.g. node['$$done'] = true. Technically I’m violating the immutability of the node, but as long as this change has no observable behaviour (other than making things more efficient) I think that should be OK.

On the next pass I can ignore nodes marked as $$done. They can’t have empty paragraphs inside. If my scan is a top down recursion, I get to skip entire sub-trees that have not been changed.

Typical ProseMirror documents might not have a great deal of hierarchical nesting so the benefit might be minimal, but in my app there is a lot of nesting, so the gains could be significant.

Thoughts? Problems? Not worth it? Better solution?

Extra credit: suppose I have a plugin that needs to appendTransaction and also wants to add decorations. How can I use this approach to find the changed nodes once, given these two must be handled in different callbacks, and (as far as I know) the API doesn’t promise to call them in a defined order?

marijn · December 20, 2021, 10:40am

You might be able to make this less hacky using a WeakSet to replace the property. Also this file has a somewhat different approach to the same problem that seems to work well too, and has the advantage that it doesn’t keep potentially problematic external state.

Tangling appendTransaction logic with logic that manages decorations seems like a bad idea. Just do the thing twice, if you have to.

tslocke · December 20, 2021, 11:29am

Thank you! Looks like “this file” mistakenly has the same link as WeakSet.

marijn · December 20, 2021, 1:20pm

Oops, this was the intended link.

PMGod · March 29, 2024, 6:52pm

Link no longer exists, I’m still working out an approach to find changed nodes.

              const oldTr = oldState.tr;
              const oldFrom = oldTr.selection.$from.pos;
              const oldTo = oldTr.selection.$to.pos;

              const newTr = newState.tr;
              const newFrom = newTr.selection.$from.pos;
              const newTo = newTr.selection.$to.pos;

              const oldNodes: Node[] = [];
              oldState.doc.nodesBetween(oldFrom, oldTo, (node) => {
                if(node.attrs.id) {
                  oldNodes.push(node);
                }
              });

              const newNodes: Node[] = [];
              newState.doc.nodesBetween(newFrom, newTo, (node) => {
                if(node.attrs.id) {
                  newNodes.push(node);
                }
              });

Above fails when multiple blocks are copy pasted.

appendTransaction(transactions, oldState, newState) {
          if(newState.doc === oldState.doc) return;

          transactions.map(tr => {
            tr.steps.map(step => {
              if (
                step instanceof ReplaceAroundStep 
                // || step instanceof ReplaceAroundStep 
                ) {
                // console.log("step", step)
                const newNode = newState.doc.nodeAt(step.from);
                console.log("newnode", newNode)
              }
            })
          })

          return null;
        },

This approach was not reliable either

marijn · March 29, 2024, 9:14pm

Here is a link that should still work.