Building a Canvas-Based Editor on Top of ProseMirror’s State and Plugin System

Hi everyone,

I’d like to share a project we’ve been building on top of ProseMirror.

The core of the project still relies on ProseMirror’s document model, transactions, state management, and plugin architecture. We did not build a new editor core from scratch. Instead, we kept ProseMirror as the foundation and replaced the standard DOM-based view layer with a custom Canvas-based rendering system.

The main motivation was to explore a page-oriented editing experience with tighter control over layout and rendering. In our case, that includes custom pagination, canvas rendering, selection geometry, hit testing, and view/runtime behavior that are harder to express in a traditional DOM editor setup. So the architecture is roughly:

  • ProseMirror for schema, state, transactions, commands, and plugins
  • A custom Canvas view layer for rendering and interaction
  • A custom layout/pagination pipeline on top of the ProseMirror document
  • ProseMirror-compatible extension points retained as much as possible at the state/plugin level

This means the project is not “ProseMirror with minor patches”, but also not a completely separate editor engine. It is closer to a ProseMirror-based editor runtime with a non-DOM view implementation.

A few areas have been especially interesting and challenging:

  • Mapping document positions to canvas coordinates and back
  • Rebuilding selection, cursor, and hit-testing behavior without relying on the DOM
  • Preserving plugin-driven extensibility while the view layer is no longer DOM-first
  • Handling clipboard, paste rules, drag/drop, and input behavior in a canvas-based environment
  • Supporting paginated document editing with predictable rendering and layout behavior

I’d be very interested in hearing from others who have explored similar directions, or who have opinions on where this approach fits relative to ProseMirror’s design.

A few questions for the community:

Hi. Interesting project! Seems to work quite well already.

I see you’re putting focus in a hidden textarea to capture text input. I think it should be possible, during composition, to read out the composed text and display that in your canvas view, so the user can see what they are doing. CodeMirror 5 used a similar approach and worked like that. For Chrome, you could also look into EditContext, which might make this easier (but it is unfortunately not supported beyond Blink yet, and even their implementation isn’t super mature).

You seem to have implemented an entirely custom copy/paste system. Since the clipboard contains HTML, and the effects of copying and pasting happen on the state level, wouldn’t it have worked to use the DOM-based parsing and serialization in prosemirror-model?

Are the copies of the ProseMirror modules under packages/lp entirely unchanged? Any reason you’re not directly loading the original modules? When doing that, I could imagine some kinds of ProseMirror plugins—those that only touch the state/document parts, not the view—just working with a system like this.

One thing that is going to be very challenging in this approach is screen reader accessibility. If the browser doesn’t really know what kind of editor the user is looking at (text content, cursor position, etc), it cannot tell the accessibility software about it. Another tricky thing will be touch screen interaction, but that can probably be mostly done with JS, though some things like the native context menu will be hard to pull off.

Thank you for the thoughtful feedback. This is very helpful, and it maps closely to the tradeoffs we have been dealing with.

You are right about the hidden textarea. Our current input pipeline uses a focused hidden textarea as the bridge for beforeinput, composition, paste, copy, and cut, and then projects the result into the canvas view. At the moment, composition handling is functional, but the visual feedback during composition is still not where we want it to be. Showing the composed text directly in the canvas layer is exactly the direction we want to push next. We will also take a closer look at EditContext, although for now we are treating it as an optional enhancement rather than a foundation because of the current browser support situation.

On copy/paste, the event handling is custom because the view layer is no longer DOM-driven, so we cannot rely on the normal ProseMirror EditorView event path. But at the model boundary we are still trying to reuse schema-based HTML parsing and serialization where possible, rather than inventing a completely separate content model. So the custom part is mainly the event bridge and view integration, not a deliberate attempt to replace ProseMirror’s document parsing/serialization model entirely.

Regarding the copies under packages/lp, that is a fair question. They are there as a compatibility layer / internal namespace while we built the canvas runtime around the ProseMirror model, state, transform, command, and plugin concepts. The long-term direction should be to minimize divergence as much as possible. I agree with your point that the closer we stay to the original modules, the better our chances are of preserving compatibility with plugins that only depend on state/document behavior and not on the DOM view.

I also completely agree on accessibility. That is probably the hardest unresolved problem in this approach. Replacing the DOM view means we lose a lot of browser-native accessibility semantics around document structure, cursor position, and selection, and rebuilding that story is much harder than rebuilding rendering or editing mechanics. Touch interaction feels more tractable in comparison, although, as you said, native behaviors like context menus and some platform conventions are still difficult to reproduce cleanly.

Our current goal is not to replace ProseMirror’s core model, but to explore how far we can go by keeping its state and plugin system intact while swapping out the DOM view for a paginated canvas runtime. Your comments are especially useful because they point directly at the areas that will determine whether this approach can remain practical over the long term.

A major motivation for this project is large-document editing, especially page-oriented documents where rendering control matters as much as the document model itself. Our main goal is not just to replace the DOM view, but to gain much tighter control over rendering, pagination, layout, hit testing, and update behavior for long documents.

In particular, we care a lot about predictable pagination and avoiding broken page composition, including issues like awkward page breaks, unstable reflow, and orphan/widow-style layout problems. A canvas-based view gives us a much more controlled rendering pipeline for that kind of document workflow than a normal DOM editor.

We are also trying to keep interaction latency as stable as possible as document size grows, rather than having rendering cost scale too directly with the visible DOM complexity. For this type of editor, controllable rendering and controllable pagination are the main reasons we are exploring this direction.

At the same time, we still want the programming model to remain familiar to ProseMirror and Tiptap developers. The goal is to keep the schema, transaction, command, and plugin mental model as close as possible to what those developers already know, while using a different view/runtime layer underneath.

So from our perspective, this is mainly an attempt to combine three things:

  • controllable rendering for very large documents
  • controllable pagination with better handling of page layout quality
  • a familiar extension model with a lower learning cost for ProseMirror and Tiptap developers

That is the main reason we are exploring a canvas-based runtime on top of a ProseMirror-style state and plugin architecture.