Tracked Changes with Strict Document Format


#1

@marijn in an rfc pull request (https://github.com/ProseMirror/rfcs/pull/3) you suggested tracked changes be done by storing steps like in https://github.com/prosemirror/prosemirror-changeset. I actually did take a look at that and wanted to go in that direction originally but due to my requirements it seemed like the wrong approach.

I am working with an external system that already creates documents with tracked changes. I need to support that document format and be able to output the same format. This existing format wraps inline text in elements and places attributes on nodes. And then inserts corresponding metadata nodes into the documents <head> element. There are more than just insert and delete operations as well. Wrapping/unwrapping content, attribute changes, and replaces are all forms of tracked changes. I was uncertain how to or if it was even possible to use prosemirror-changeset with the above constraints.

So I took the approach of using marks for the inline text changes. I am still working out a method for applying attributes to nodes that have a tracked change. I am applying these marks by hooking into input handlers such as handlePaste, handleDrop, handleTextInput, and handleDOMEvents.cut. I am worried that I won’t catch every input type. I’d like to be making changes at a lower level like in prosemirror-changeset.

Does my approach lead to madness? Do you think there is still a way to leverage prosemirror-changeset? FYI these documents are also edited in a collaborative environment.


#2

Probably, yes, but I’ve never worked with a system like what you describe (a document format that encodes various types of changes), so trying to do it differently might still lead to madness.

But in general, I think you really want to work on the step or transaction level, not the UI level, since as you already hint, there’s just too many different ways in which people might interact with your documents, and there’s a non-trivial amount of interpretation going on in methods like Transform.replace when you do things like paste or delete arbitrary selections—it might have to introduce nodes or unwrap nodes to conform to document schema constraints. So if you want relatively regular, reliable data, the transaction level is what you want.

Whether prosemirror-changeset is a good fit here I don’t know—it reduces a set of steps to a series of insertions and deletions, but if you have other types of changes that you want to track, it may not be easy to apply.


#3

In my editor I am showing tracked deletions as text with a line struck through it. And insertions as a different color than the rest of the document.

Because this is a collaborative document I need to display those changes the same in all peer editors. To do this I believe the tracked changes need to be applied directly to the document in order to be sent to the peer editors. That being the case I think appendTransaction could be the best place for me to apply the tracked change marks and node attributes. But I am not entirely sure how best to go about it. I think I would have to go through each transactions steps to figure out if an insertion or deletion happened and then apply the marks to the ranges of those steps. Now correct me if I am wrong but I think I could use prosemirror-changeset here to calculate the inserted and deleted spans I need. Using those spans I could apply the marks/attributes to the span’s ranges.

Does that sound feasible?


#4

The “changes” are implicit in the steps you’re already sending to peers—i.e. if you derive the deleted sections from the set of steps since a given point in time, all peers should be seeing the same steps and compute the same deletions.


#5

I’ve been moving forward with this method and it has been working well so far. I came upon something that I do not quite understand though. When lifting an empty block the changeset’s insertion position range appears to be off by 1.

In my example I have a list with 3 items. I hit enter to create a 4th item. Then hit enter again. This causes the lifting of an empty block and ultimately a ReplaceAroundStep. The resulting doc is a list with 3 items and a paragraph after the list.

Before:

<ol>
  <li><p>Item 1</p></li>
  <li><p>Item 2</p></li>
  <li><p>Item 3</p></li>
  <li><p></p></li>
</ol>

After:

<ol>
  <li><p>Item 1</p></li>
  <li><p>Item 2</p></li>
  <li><p>Item 3</p></li>
</ol>
<p></p>

The new document is correct but the span in changeSet.inserted points to the <ol>. So if I were to run newState.doc.slice(span.from, span.to) it would give me the <ol> instead of the <p>.

I am grabbing the change set in Plugin.appendTransaction like this:

var changeSet = pm.changeset.ChangeSet.create(oldState.doc).addSteps(newState.doc, transaction.mapping.maps);

I’m not sure if my assumptions of change set are incorrect here or if the insertion span’s range is incorrect. Other transactions have worked with my assumptions up to this point.


#6

I’m not entirely sure what your assumptions are, but what lift will do, in this case, is create a replace-around step that replaces the opening list item token (<li>) after the empty paragraph with a closing list token, and deletes the two closing tokens after the empty paragraph (</li></ol>). That way, the paragraph itself isn’t changed or moved, but only the tokens around it are updated to reflect the new structure.


#7

My assumptions were that the change set’s inserted spans would point to the new <p></p> after the <ol> in the newState parameter of Plugin.appendTransaction. And that the change set’s deleted spans would point to the old <li><p></p></li> in the oldState parameter of Plugin.appendTransaction.

However I am getting these results (FYI in my example there is more content before the list making the positions start in the 500s):

var changeSet = pm.changeset.ChangeSet.create(oldState.doc).addSteps(newState.doc, transaction.mapping.maps);

changeSet.inserted[0]
// { from: 545, to: 546 }
newState.doc.slice(changeSet.inserted[0].from, changeSet.inserted[0].to)
// <ol></ol>

// changeSet.deleted has two items in it

changeSet.deleted[0]
// { from: 545, to: 546, pos: 545 }
changeSet.deleted[0].slice
// <ol></ol>
oldState.doc.slice(changeSet.deleted[0].from, changeSet.deleted[0].to) 
// <ol></ol>

changeSet.deleted[1]
// { from: 548, to: 550, pos: 548 }
changeSet.deleted[1].slice
// <ol><li></li></ol>
oldState.doc.slice(changeSet.deleted[1].from, changeSet.deleted[1].to)
// <ol><li></li></ol>

#8

I’ve run into another unexpected change set involving lists. It happens when deleting an empty list item with backspace.

Before:

<ol>
  <li><p>Item 1</p></li>
  <li><p>Item 2</p></li>
  <li><p>Item 3</p></li>
  <li><p></p></li>
</ol>

After:

<ol>
  <li><p>Item 1</p></li>
  <li><p>Item 2</p></li>
  <li>
    <p>Item 3</p>
    <p></p>
  </li>
</ol>

The changeset contains one deleted span and no inserted spans:

// DeletedSpan
{
    data : undefined,
    from: 1499,
    pos: 1499,
    slice: {content: Fragment, openStart: 1, openEnd: 1},
    to: 1501
}

where the slice is two list_item nodes with no content.

I would expect to get a change set containing one deleted list_item and one inserted paragraph.

Looking more closely at prosemirror-changeset leads me to believe it is unable to understand the nuance of a backwards join (which I believe is happening in this case). The transaction here is one replace step

// ReplaceStep 
{
    from: 1499,
    slice: {content: Fragment, openStart: 0, openEnd: 0}, // An empty slice
    structure: true,
    to: 1501
}

And transaction.mapping.maps (which I pass into ChangeSet.addSteps) has only one StepMap

// StepMap
{
    inverted: false,
    ranges: [1499, 2, 0]
}

I do not think that ChangeSet can do anything with that one StepMap other than find one deleted span.

@marijn is prosemirror-changeset returning the correct results? It certainly seems to be returning the only results it can given the input. But it does not really seem like the correct results to me. If prosemirror-changeset is working correctly then it looks like I will need to approach the detection of changes differently and not solely depend on prosemirror-changeset.


#9

It deletes the </li><li> tokens, which can be done with a single replace step.

It doesn’t do nuances—it represents everything as only deletions and insertions, which is correct (it fully expresses what happened), but may not always be easy to interpret/display.


#10

Thanks for bearing with me @marijn. I think I understand what prosemirror-changeset really does now. And I’ve written some code that seems to do a good job at detecting a backwards join using the change set data and some looks up into the old and new editor state.