Finding a changed node

Hi everyone,

I have the following case: Text in my editor (TipTap) goes through processing and is returned to me with some custom tags/Marks around some word.

Example: faultyword → faultyword

This is a simplification of course as I have more attributes in this Mark.

Anyway, I need to remove the Mark around the word if user changes the word and I managed to do this, but I am questioning my approach.

I made it by creating a new ProseMirror plugin and I update in appendTransaction where I have access to old and new states and a transaction.

I only observe transaction that have a single step, why I did this is because I noticed that user typing, even selection and then typing creates 1 step, so I don’t want to touch any other case and mess something up.

Anyway, I find the from and to from the step and figure out if it is an insertion, deletion, replacement and if it is at the last place in the document.

Then I find the node with newState.doc.nodesBetween and if it is a text node that isn’t an empty space I remove the marks I need to remove. What I noticed that in case when deleting last letter of a word/node I can’t find that node using newState and so I search oldState nodes and it gives me the node in question.

For now I did not find (m)any side effects but this way of doing this feels off. I am wondering if there is an easy way to get a handle on the node that was modified by a transaction between a new and old states?

Thanks for any inputs!

Since you’re interested in words touched, maybe what you’d need to do is check for word characters at the start and end of the change (in the old doc if anything was deleted, in the inserted content otherwise), and extend the range you look at to cover the rest of the words outside the (new-document) changed range when it inserts/deletes word characters.

I know but for that I need to compare the same node how it looks in old and new state and of course if it is at all there.

For example, I am still fighting the cases where I have something like” ”wordone wordtwo wordthree” and let’s assume all of them have the custom Mark. Then user selects “ne wordtwo wo” and replaces that with typing X so I have: “wordoXrdthree” In this case I need to find all three nodes and remove Marks from them, actually I will find only first and third node as second one doesn’t exist enymore. With that I have issues because position and then traversing the nodes doesn’t give wanted results.

So what I am having trouble with is being really sure about node identification. Silly question but I don’t think nodes have ID’s or something similar?

If wordtwo is removed from the document, why would you need to modify it?

No, nodes do not have ids. They don’t even have identity in this system. If you replace a node with another node that has the same type and content, the document is considered the same, because these are treated as value types, not objects with a pointer identity.

No, I don’t need to modify wordtwo but I do need to modify the new state which is wordoXrdthree and for example, right now when I use from and to information from the step the node I find is a paragraph with two children.

What I am going to do is make this a special case and remove the Marks from all nodes of type text that are in the content field of this found paragraph node.

But that is why I am asking if there is a better way as this looks overly complicated that will be hell to debug later if some error occurs. I will share my code just for reference below.

newState.doc.nodesBetween(from, to, (node, pos) => {
        if (node.isText && node.text !== " ") {
          newStateTransaction.removeMark(
            pos,
            pos + node.nodeSize,
            newState.schema.marks[TextCheckTagName],
          )
          modified = true
        }
      })

I removed some parts but this is the crux of it. I do exactly the same traversal on the oldState as I noticed that even if both are found and I add removeMark twice, it does not hurt, maybe I just didn’t notice the bug of this yet :slight_smile:

newState.doc is the result of the step being applied on the oldState.doc. More precisely any transaction contains a list of the docs how they were before corresponding step was applied on it. So for tr.steps[0], the doc on which this step was applied is tr.docs[0]. So you need to inspect that doc using step.to, step.from etc. Those values are in reference to that doc. That’s why you may sometimes not see the things you are looking for in the newState.doc.

Generally, I would suggest to rely on the step for a difference check, since the step that you are inspecting is actually the difference representation between 2 docs that you are looking for (multiple steps can be in a single transaction but each step will produce a new doc). Roughly, what I would do is get the text content around the step and move pointers left and right until I stumble upon spaces or punctuation. Also I would check for spaces inside removed or added slice. That would tell me the positions of the word that I modified.

Additionally there are 2 steps that can represent text modifications ReplaceAround and Replace steps. They behave a little bit differently. I suggest to check the doc reference on it to avoid possible confusion. :slight_smile:

Edit: In this approach we work with the positions in the doc before the step was applied. So what I would do once I found the positions of the word, is to map them to get actual positions on the latest doc using tr.mapping.map()