Schema versioning and migrations

kiejo · May 23, 2016, 7:30am

I just saw @lessless mentioning this topic in another thread and thought I’d open a new one specifically dedicated to schema versioning and migrations.

Are there currently any plans/ideas in regards to evolving a schema in a non-backwards compatible way? Examples that come to mind would be:

Removing a NodeType or MarkType
Introducing more restricitve constraints
Changing the implementation of an existing NodeType
Changes in a new ProseMirror version

A versioning/migration mechanism would probably be necessary when parsing existing document content and steps.

Is this something that’s on the roadmap? I think it’s an important topic. Would be great to think through some scenarios of how this could be handled.

marijn · May 23, 2016, 7:45am

This was discussed at some length during the first summit in Berlin, and there the conclusion was that this is best left to the user. I.e. when you update your schema, write your own upgrade function, and if you’re able to, run it on all existing documents right away. If you’re not able to, store schema versions with your documents, and automatically upgrade them as they are read.

If you think part of this would benefit from library functionality, you can write it as a separate package. (Diffing schemas and then autogenerating upgrade code when possible, maybe.)

kiejo · May 23, 2016, 9:19am

Ok, thanks for the information! I’ll see if some functionality can be moved to a library as soon as we start writing migrations ourselves.

lessless · May 25, 2016, 10:05am

Yeah, that makes sense taking into account that it’s possible to have custom elements

alidcastano · December 6, 2017, 8:00pm

Hey @kiejo, I’m currently looking into this. Did you encounter any problems handling schema migrations / see aspects of the migration that could be abstracted into a library? Thanks

kiejo · December 12, 2017, 10:33am

So far I have taken two different approaches to schema migrations depending on the kind of migration:

Directly modify the content as JSON. This approach is very straightforward as you simply use standard JavaScript functions to modify the content. I used this once when the MarkType serialization format was changed and it worked well for this use case.

Use ProseMirror transforms to modify the document. This one can be a little bit more involved, but I found it easier to work with when for example changing existing constraints (e.g. modify content constraints, remove certain types of nodes, turn an inline node into a block node, convert one node into another). The process I used looks like this:

Create a schema that is solely used for the migration and which supports the constraints of both the source and the target schema.
Use schema.nodeFromJSON to turn your JSON content into a Node using the “migration schema”
Use node.descendants to iterate over all nodes and store ops that you want to perform in an array (e.g. [{op: 'delete_node', position: 4}]).
Sort the ops array by position in descending order and perform the actual document transforms based on the information stored in your ops array (e.g. tr.delete(position, position + 1)).
Use node.toJSON() to store your migrated content, which you can then load with your actual target schema with the new constraints in place.

Things to look out for:

Of course you need to make sure that your transforms result in a document that is supported by your target schema.
Depending on the kind of transforms you use, sorting your ops by position might not be enough and you might need to additionally use tr.mapping.map to update the positions that you gathered in step 3.

Overall this process has worked well for the use cases I had, but I don’t think that I have gone through enough migrations to put this into a reusable library yet. Hopefully this helps as a starting point. It would also be interesting to know how others have approached this topic and if there are better ways to do this.