Schema API design

Hi all. I’ve been doing some work on delivering the promised feature of custom document schemas, along with a nice API for defining them, that I want to share with you. What I’m going to outline in this post exists in the master branch right now. It isn’t to be considered stable (further work might make sweeping changes), but it is solid enough to start a discussion on.

A document schema is a description of the shape a document can have. I am aiming to define a schema system that is powerful enough to describe most common types of documents, but not to overengineer it or complicate it so much that it becomes hard to think about. That means that, while I’m definitely interested in examples of useful documents that it can’t express, I am not committing to trying to support everything you come up with.

In the current code, each editor has a document schema associated with it. This is an object that contains a map from node names to node type objects, and another object mapping inline style name to inline style type objects.

A node type object describes something like paragraph nodes or ordered list nodes. They may describe a set of attributes, and if they do, each instance of such a node will have those attributes associated with them. An example of an attribute is the order attribute on an ordered list (describing the number at which the list starts) or the src attribute on an image element. Node types also describe the ‘categories’ of the node, which is an array of strings, and the category type it may contain. For example, the top-level document node can contain any block type. Paragraphs, ordered lists, and so on have category ‘block’. Ordered lists themselves can contain only the list_item category, which only list item nodes have. Inside of paragraphs, only inline nodes may appear, and inside of a horizontal rule or an image node, nothing may appear.

This is the way the structure of the document is constrained, and all the document transformations make sure to respect these constraints. I am working out an additional feature, where a node type can be set to be ‘fixed’, meaning normal operations can’t change its content. This has some hairy repercussions (for example, what happens when you copy part of a fixed node), but would allow modeling something like an image node that contains a caption (but not two captions, etc).

Style types may also have attributes, such as href on the link style. Other than that, they don’t do much.

Both node types and style types are instances of a given class. Defining a schema is done by mapping node names to such type classes. The classes come with default categories, contains fields, and attributes, but a specific schema can override these to reconfigure existing node types into a somewhat different structure, or to add new attributes.

There are three ‘basic’ types of nodes that all concrete node types inherit from: Block, for nesting blocks, Textblock, for things like paragraphs that are blocks, but contain flat inline content, and Inline for inline content such as images or hard breaks. (Text is a subclass of Inline with some special behavior.)

The node/style type classes are also the place where serialization and parsing logic for those nodes and styles lives. How exactly this works differs per serializer/parser. For example, the DOM/HTML serializer expects a .serializeDOM method on the node type (each node has its type object stored in a property), which it can simply call to serialize a node of that type. The DOM parser, on the other hand, gathers all the parseDOM properties of the types in the schema, which contain objects like {tag: "p", parse: ...}, telling it that whenever it encounters a <p> tag, it should call the parser that it found on that node type (in this case, the Paragraph type). Node and style types can specify handlers multiple tag names (for example <b> and <strong>), and can specify a precedence, to do something like have a high-precedence parser that checks whether a given attribute is present, and declines to handle the node if it isn’t, leaving it to a lower-precedence parser.

Other things that should end up on these type objects (but aren’t yet) are node-type-specific commands, key bindings, menu items, etc.

A schema is derived from a ‘schema spec’, which is currently a pair of two objects, the map from node names to node types and the map from style names to inline style types. For example, the definition for the default schema looks like this:

const defaultSpec = new SchemaSpec({
  doc: Doc,
  blockquote: BlockQuote,
  ordered_list: OrderedList,
  bullet_list: BulletList,
  list_item: ListItem,
  horizontal_rule: HorizontalRule,

  paragraph: Paragraph,
  heading: Heading,
  code_block: CodeBlock,

  text: Text,
  image: Image,
  hard_break: HardBreak
}, {
  em: EmStyle,
  strong: StrongStyle,
  link: LinkStyle,
  code: CodeStyle

All of these types are exported (from the model module), for reuse and extension. The objects actually end up looking like {text: {type: Text} – i.e. the constructors are wrapped in objects with a type property – and you can update a schema spec by calling its updateNodes (or updateStyles) method:

const flatSchema = defaultSpec.updateNodes({
  list_item: {contains: "flat_block"},
  blockquote: {contains: "flat_block"},
  paragraph: {category: "flat_block block"},
  code_block: {category: "flat_block block"}

This doesn’t replace these nodes, but adds adds the category flat_block to the paragraph and code block nodes, and makes the nesting block nodes (list item and blockquote) only allow that category, so that the resulting document can’t nest arbitrarily anymore.

Inside updateNodes, you can set a property to null to delete a node type, or provide a new value with type property to replace one. There’s a similar method updateStyles for messing with the styles in a schema. These are indended to make it possible to easily derive slightly changed schemas from existing schemas.

The Schema constructor takes a schema spec and produces an actual schema object (the thing you give to an editor, or use to directly create nodes). It has methods like node and text to create nodes or text nodes, and nodeFromJSON to deserialize a node in this schema that’s represented as JSON. Each node or style type instance has a link back to the schema it belongs to, so if you only have a node, you can do .type.schema to get at its schema.

I’m now thinking about integrating UI stuff into the schema system, but before I work on that, I want to address a bunch of problems with the current UI (mostly around manipulating nested content and opaque block elements). This involves rethinking a lot of stuff so it might be a while before that moves forward. Do use that time to give me feedback on this schema model.


Sounds good. If I want to add a new custom node type, how and where would I define its behaviour?

Good question. There’s a basic demo in the dino example that might help. The built-in node types have their serializers/parsers defined in the files that define the parsers and serializers themselves, but for your own types, you’re expected to just put them in the place that defines your node type class.

Update: I’ve pushed another series of patches that make it so that

  • Commands (the things you bind keys to or run via ProseMirror.execCommand) are now associated with node/style types if they only make sense in schemas with that node/style in them. Only the commands that are applicable to the editor’s schema are available in an editor.

  • Enough metadata was added to commands to make the same objects double as menu items (when appropriate), so those are now also automatically targeted to the schema.

  • Rather than having a hard-coded default keymap, commands contain default key binding information, and if you don’t specify a keymap, a default one is derived from your current set of commands (i.e. key bindings are now also schema-sensitive)

Next challenge: Making complicated commands like backspace and enter behave in the various node types (think lists, code blocks) without hard-coding their behavior for those specific node types.

As an aside, I’m pretty sure I’m going to have to introduce something like node selections, at least for ‘leaf’ nodes (such as images and horizontal rules), to make the interface powerful enough to do things like modify an image using the keyboard, or insert a new paragraph after the last horizontal rule in the document, if there’s no other nodes after that rule (which currently means you can’t put your cursor after it).

Still debating whether to also have node selection for non-leaf nodes (to resolve ambiguity that currently exists when executing block-related commands in nested blocks – which block is meant?) This’ll probably be a power user feature – you can use the editor just fine without knowing about it, but if you are manipulating complicated nested markup, your life will be easier if you use it.

Nice work on the schema API design! I’m very interested in “Making complicated commands like backspace and enter behave in the various node types” and would like to know what the current status of this challenge is.

I’m currently looking at how to make lists behave more intuitively when hitting enter/backspace in a list item and it sounds like this is exactly the kind of use case you want to support. Would be great to get an update on this.

See this thread

1 Like

Hi Marijn,

Just discovered this topic about the Schema API and the ideas behind it! Amazing work!

Having the schema about the doc structure will definately help a lot when having custom components or custom meta dom structures.

My question is - you are using the schema now just for transformations and commands, but how about for changing the custom elements attributes or nested structure?

So in the schema definition I would like to add not only how it is nested but also every tag what attributes it accepts, what is there type and possible values. So we can actually build a generic property inspector for the specific tag described in the schema.

Any thoughts about this?

The schema already determines which attributes are valid for each tag, and whether they have a default value or not. It does not constraint the type of such values, that’s something you’ll have to check in your own code.

But can we have a generic property inspector, fed by the schema - just like the toolbar but in an a separate panel that is automatically filled with properties valid only for the current context?

Also how about inserting sub nodes that are allowed only under a specific context by the schema? How can we insert those - maybe also a generic toolbar with such elements - fully context and schema aware should be available.

And lastly you want to drag around the specific custom elements - but their contents is defined by the schema as well order in the places (parents) to be dragged. So such dragging functionality should be also fully schema aware,

Having all this available in the core - will help us concentrate on schema design only and make the ProseMirror editor be much more strict but still flexible with different doc schemes.

As use cases think of web sites utilizing for example Bootstrap or other frameworks that require a very specific nesting DOM structure and custom attributes. We should be able to define a schema for it so that the user can edit those elements on a much higher level - as components and don’t mess with their inner structure as they will be partially locked and behave only as the schema allows

Content-manipulation commands and dragging are already schema-aware. Automatically creating dialogs and menus from the schema isn’t something the core library does, but you can do that in 3rd party code.

Yes of course creating UI and dialogs is part for 3rd party, but are there enough hooks available for getting the selection for a specific element of the given schema, with its available properties and also API to mutate it? As well hooks for dragging & dropping the custom element as a whole?

Also can the display of the custom element/widget be locked in the editor, maybe only specific content parts unlocked for editing?