Changed part of document

Hello,

Imagine you are highlighting words. If there is any change in document, example rescans hole document every time.

This might by inefficient for large documents. It would make much more sense to scan only changed part (or something around changed part).

Is there a easy / proper way to know which part of document was changed by transaction? Maybe some from, to positions for transactions. So we can directly call highlighting on only changed part of document.

Thanks a lot.

The recommended way to do this is to compare the old document to the new one and only process changed nodes. You can see an example of that here.

An alternative is to inspect the maps for the steps in the transaction, calling forEach on each to get the range that they touched, computing a minimum/maximum from those. You’ll have to take care to map positions from previous steps forward as you process subsequent steps (in transactions that contain multiple steps).

2 Likes

Thanks a lot!

Comparing documents sounds good. Thanks to document is persisted immutable data structure, it is not as had to implement. I used Map as a diff checker. Because nodes that does not change has the same hash key.

This is how solved the problem with changed nodes (deleted, added). Maybe this might be useful for someone else so I am putting my code here.

// returns removed nodes from old documents and added nodes in new documents
const compareDocuments = (oldDocument, newDocument) => {
  let diff = new Map()
  let removed = []

  newDocument.content.forEach((child, position) => {
    diff.set(child, position)
  })

  oldDocument.content.forEach((child, position) => {
    if (diff.has(child)) {
      diff.delete(child)
    }
    else {
      removed.push({ child: child, position: position})
    }
  })

  let added = Array.from(diff.entries()).map((value) => {
      return {child: value[0], position: value[1]}
    }
  )

  return {removed, added}
}

// calling in plugin
apply(tr, oldValue, oldState, newState) {
  let changed = compareDocuments(oldState.doc, newState.doc)
  // do something with results
}

Because my document schema is very trivial (similar to example in document guide). Therefore I do not need to call function recursive:

const trivialSchema = new Schema({
  nodes: {
    doc: {content: "paragraph+"},
    paragraph: {content: "text*"},
    text: {inline: true},
    /* ... and so on */
  }
})
1 Like

Hi.

Sorry to bring up this old topic.

I am currently in this situation. I could see that you came up with a solution, but it seems to be scanning the whole document in every update (newState.doc & oldState.doc both contain full document). Am I missing something?

If you want to implement track changes you are in a bit of trouble as detecting inserted/deleted content is very difficult. You can try using prosemirror-changeset for that, I made some example repo in github. To just iterate over inserted/replaced content per transaction you can use stepMap’s fromB, toB with newDoc.nodesBetween. If you want to know what was deleted you can use oldDoc.nodesBetween(fromA, toA, .... Maybe that helps you in optimizing your scanning.

A very rudimentary solution is to use findDiffStart and findDiffEnd to find the changed start and end of two documents. This can be enough for most cases.

const from = oldState.doc.content.findDiffStart(newState.doc.content)
const to = oldState.doc.content.findDiffEnd(newState.doc.content)

if (!from || !to || from === to.b) {
  return
}

newState.doc.nodesBetween(from, to.b, (node, pos) => {
  // ..
})

An edge case for this solution is a change at the start and end of a document within a single transaction for example. To calculate exact changes you have to map over transaction steps like this.