Serializing and storing Mapping objects

Background: In PubPub, we permit users to annotate selections of text within a document with their comments. These are stored (approximately) as Selection objects associated with a key pointing to a position in a document’s history. When further steps are applied to the document, we must use them to map these selections to the right point in the document, roughly like:

let mapping = new Mapping(steps.map(step => step.getMap()));
let newSelections = oldSelections.map(selection => selection.map(newDoc, mapping));

This occasionally needs to be applied retroactively for a Selection applied at step n on a document now at step m for m >> n. This requires us to retrieve steps n+1...m from storage, which becomes a bottleneck on very large documents and makes some desired features (like the ability to comment on a past revision) less feasible.

The question: rather than retrieving a long list of steps to map through, might it be possible to serialize and store the Mapping induced by a list of steps? We store checkpoints every 100 steps to make long documents easier to reconstruct, and it would be natural to store the Mapping associated with those 100 steps alongside the checkpoint. Then, fast-forwarding an outdated Selection from n to m would reduce to retrieving the (m-n)/100 checkpoint Mappings and then a much shorter list of steps not yet associated with a checkpoint.

At a glance the Mapping objects themselves appear easily JSON-serializable, but the library does not provide a method to do this, which gives me pause. So I am curious to get a second opinion on whether this feels like a good idea or not.

An aside: a natural suggestion to solve the underlying problem here would be to use Marks to store annotation ranges, since those will automatically be mapped through new steps, and are more robust to cut/copy/paste changes to the document. We are considering this, but it doesn’t feel like a great fit for public annotations which might number in the many thousands and appear in channels accessible to disjoint sets of people (like classrooms or blinded reviewers). These seem better modeled as layers on top of the document rather than part of the document itself.

In most use cases so far, people had the serialized steps at hand when they needed to map through them. A series of mapping is likely to be a bit smaller than the full steps, but still linear in size to the step count, since there is currently no way to merge mappings (though I guess one could write such a function relatively easily, if you just want something that maps reasonably, not something that maps in the precise same way as the series of mappings).

Ah, I had thought they might be more compact, e.g. by representing sequential n one-character additions the same way as one n-character addition. But that does seems like an over-optimization for an object never intended to leave the client.

If we did pursue this, I wonder how far we’d get by using step.merge to merge mergeable steps together and then taking the Mapping over those merged steps, e.g.

const getMergedStepMapping = (steps) => {
	const mergeResult = steps.reduce(
		({ mergedSteps, mergingStep }, nextStep) => {
			const maybeNextMergedStep = mergingStep ? mergingStep.merge(nextStep) : nextStep;
			if (maybeNextMergedStep) {
				// Merge was successful
				return { mergedSteps: mergedSteps, mergingStep: maybeNextMergedStep };
			}
			// Merge failed; add a new entry
			return { mergedSteps: [...mergedSteps, mergingStep], mergingStep: nextStep };
		},
		{ mergedSteps: [], mergingStep: null },
	);

	return new Mapping(
		[...mergeResult.mergedSteps, mergeResult.mergingStep]
			.filter((x) => x)
			.map((step) => step.getMap()),
	);
};

I wonder if this would produce a Mapping with the same black-box behavior but smaller internal structure than the one created from the original steps. Food for thought, anyway.

Thanks for the quick reply!

I wonder if this would produce a Mapping with the same black-box behavior but smaller internal structure than the one created from the original steps.

It doesn’t, unfortunately, but that is probably not a problem. Mapping is really fragile, in a way that seems impossible to avoid—see also the last section of this blog post.

Thank you!