Pandoc export

massi · May 24, 2024, 8:55am

Hello @johanneswilm, I’m writing a Prosemirror-based editor for Pandoc’s internal format. Here’s the link, but for now you won’t find anything, because I haven’t published it yet, sorry.

It reads and writes Pandoc’s JSON format, so I had to write the code to make the conversion from Pandoc’s Blocks and Inlines to Prosemirror’s Nodes and Marks, and vice versa.

Pandoc vs Prosemirror

The conversion is pretty straightforward for blocks, but it’s rather complex at inline level, because you have to match the tree-like nature of Pandoc Inlines with the flat model of Prosemirror Marks.

It is perfectly fine for Pandoc to nest an Emph inside another Emph, but it’s difficult to model it with a Mark in Prosemirror, unless you differentiate the two Emphs with some attribute, and set the excludes property to an empty string in its MarkSpec.

That’s because in Prosemirror a Mark is either set or not set on a span of text; you can’t set it twice.

Even for a given document model – I’m focusing on the Pandoc AST now – you can imagine a bunch of slightly different Prosemirror schemas.

For example, how do you model a Pandoc RawInline? Since it’s an Inline, I first thought of a Mark in Prosemirror. I eventually decided for an atomic inline Node instead, providing a textual sub-editor in the GUI.

Back to your question

I think it’s nevertheless possible to abstract some functions to help in the conversion between Prosemirror and Pandoc, or any format that is tree-like at inline level. The trickiest part of that job is solving the flat vs tree-like translation.

Here I’m describing the path I followed, because I think it relates to your question:

I started thinking of different prosemirror-based editors for different models;
for each one I wanted to provide an export function to Pandoc JSON, this way providing an export to any format supported by Pandoc;
it meant maintaining a bunch of editors sharing parts of code and the ability to export into Pandoc JSON;
eventually I opted for a single editor based on Pandoc JSON, that can be configured to adapt to different models and workflows, that way becoming “multiple editors”;
choosing Pandoc internal model is clearly a strong requirement, but you can use all its input and output formats, and you can even support further ones through custom readers and writers;
the challenge I face is making a single editor become “multiple editors” without changing the editor’s code, only through configuration files or custom readers, writers and filters