The Unreasonable Effectiveness of ProseMirror Model in Rich Text Transformation

SMores · May 19, 2026, 11:58am

Hopefully this is interesting to some folks! I found myself reaching for ProseMirror while working on a rare non-text-editor project of mine, which nonetheless needed to transform rich text. Wrote a bit about how I ported over a minimal ProseMirror model and transform implementation for that use case, and how much it simplified the transformation!

marijn · May 19, 2026, 12:55pm

Hah, okay, that works I guess. Thanks for sharing.

SMores · May 19, 2026, 1:14pm

It almost sounds like you didn’t have marking up EPUBs in mind when you designed ProseMirror? Crazy, could have fooled me!

mustafa0x · May 19, 2026, 1:20pm

That means that we can only synchronize audio clips at the level of HTML elements (and only if those elements have unique IDs)! This is a pretty significant limitation, since nearly all EPUBs only have textblock-level markup, and rarely with IDs on every element. What do we do, if we want to provide a sentence-level synchronization? What about word-level?

This is indeed a difficult problem. I had a similar challenge where I had word-level timings for words, and wanted to highlight words during playing. But the text was post-processed. And the post-processing rules were arbitrary. Thus I had to “project” the word-timing indexes onto the post processed text. It was messy.

SMores · May 19, 2026, 3:32pm

Yeeeeup! I don’t know if you saw, but in the post I linked to this proposal to encourage better support for URI text fragments in media overlays. A text fragment would let you target a span of text in an HTML document without having to rely on pre-existing markup. But this is similarly challenging to produce on the production side, and it’s much, much harder to implement on the reading system side! We’re working on adding support for it to Thorium/Readium, but it’s tough