Serializing steps and documents without unmodified attributes

We are in the process of considering options for optimizing the storage of a large number of ProseMirror documents along with their version history (list of steps). One possibility that came up was to avoid storing attributes on nodes that have their default value, as well as to elide them in steps.

This seems like it works fine; at any point where ProseMirror instantiates a node, it will set any missing attributes that have default values. The main caveat with this approach is that any changes to default attribute values in the schema would apply retroactively. For our use case, this is an acceptable tradeoff - we rarely change schemas in backwards-incompatible ways, let alone changing default attribute values. (Another minor drawback is a slight performance hit in serialization to compare each attribute value to the default). The upside, though, is that we have nodes with a lot of optional attributes that now no longer need to be stored.

Here is the extent of changes needed: Comparing ProseMirror:master...fellowapp:no-default-attrs · ProseMirror/prosemirror-model · GitHub. My question is: would this change be considered if we were to upstream it? Are there other drawbacks we haven’t considered that would prevent this from being the default behavior?

2 Likes

In a way, the JSON format is part of the public interface, and consumed by code other than ProseMirror’s own fromJSON, so changing it is no longer really an option at this point. You could, though, use your own custom JSON serialization/deserialization logic easily enough if you need to store ProseMirror data structures in some alternative way.

Thanks @marijn, that makes sense. It’s too bad there’s no good extensible way to do JSON encoding in JS though because otherwise this would be much simpler to achieve.