Best way to loop through the differences between two fragments

I am basically trying to implement something similar to @marijn’s comment in the suggestion post. Let’s say there are two documents, one modified offline and the server version that may have been modified in the meantime. When the user reconnects I want to loop through all the differences between the documents and show in green the insertions, and red the deletions.

I’ve looked at using findDiffStart to find the differences between the two documents and I am able to find the position for the first difference between the docs, and using findDiffEnd I think I can infer if content was added or removed. I was hoping that I can continue to look for more differences in the document by passing the last position in subsequent calls i.e. local.findDiffStart(remote, pos) but it does not seem to work as expected (it returns the same position + the number passed as parameter). Also this pos argument is not documented so I guess it’s private and should not be relied upon.

What is the best way to achieve this? I guess I could cut the documents and continue looping using findDiffStart and findDiffEnd but that will make the code more complicated by having to manage multiple copies of each document (the original and the cut copy for diffing).

Is using findDiffStart and findDiffEnd even the correct approach to solve this? I looked at using Google diff-match-patch on the text/html content as well but I’d rather stay within ProseMirror domain if possible.

2 Likes

There’s a (crude) implementation of something like this in this repository. It loops over steps to find the areas to highlight, though, so if you only have documents, it might not be what you need. If so, look into tree diffing algorithms. You won’t get far on tree-shaped data structures with a text diffing algorithm like diff-match-patch.

Hi @jtblin,

We have exactly the same use case as you and are probably trying to wrap our head around the same concepts. Pretty hard things to figure out.

We were wondering if you found a way through this and, if yes, could share some feedback on how you manage to do it.

Thanks in advance, Matthieu

Hey, has anyone else looked into this since. We are also in the situation that at times we will only have two documents (no steps) and need to compare these. Ideally the output would be an array of decorations on the newer document marking where this have been added and deleted.

If nothing exists so far, I would imagine the easiest is to use a general json-diffing mechanism implementing RFC6902 that outputs diffs and then write some custom code to turn these into decorations. Given that the diffs found by RFC6902 are all in document order and that order is the same as the position order, the first position of these decorations in document A should be discoverable by using findDiffStart between document A and B. Then one applies the first RFC6902 change to document B and now findDiffStart should find the position of the second decoration, etc. .

Any other ideas out there?

Actually - forget that. The RFC6902 does not seem to do any text node diffing (it just replaces the entire text), so that will give it a different position value. One may be able to work around this complexity by additionally adding a text diffing mechanism, but I am not even sure all the other position numbers will come out right…