Background: In PubPub, we permit users to annotate selections of text within a document with their comments. These are stored (approximately) as
Selection objects associated with a key pointing to a position in a document’s history. When further steps are applied to the document, we must use them to map these selections to the right point in the document, roughly like:
let mapping = new Mapping(steps.map(step => step.getMap())); let newSelections = oldSelections.map(selection => selection.map(newDoc, mapping));
This occasionally needs to be applied retroactively for a Selection applied at step
n on a document now at step
m >> n. This requires us to retrieve steps
n+1...m from storage, which becomes a bottleneck on very large documents and makes some desired features (like the ability to comment on a past revision) less feasible.
The question: rather than retrieving a long list of steps to map through, might it be possible to serialize and store the
Mapping induced by a list of steps? We store checkpoints every 100 steps to make long documents easier to reconstruct, and it would be natural to store the
Mapping associated with those 100 steps alongside the checkpoint. Then, fast-forwarding an outdated
m would reduce to retrieving the
Mappings and then a much shorter list of steps not yet associated with a checkpoint.
At a glance the
Mapping objects themselves appear easily JSON-serializable, but the library does not provide a method to do this, which gives me pause. So I am curious to get a second opinion on whether this feels like a good idea or not.
An aside: a natural suggestion to solve the underlying problem here would be to use Marks to store annotation ranges, since those will automatically be mapped through new steps, and are more robust to cut/copy/paste changes to the document. We are considering this, but it doesn’t feel like a great fit for public annotations which might number in the many thousands and appear in channels accessible to disjoint sets of people (like classrooms or blinded reviewers). These seem better modeled as layers on top of the document rather than part of the document itself.