Can I convert a random HTML into Prosemirror JSON/doc

I am curious if we can take any HTML, pass it through our prosemirror schema and convert it into a JSON (Some of our RTE users want to consume HTML format data and others want to consume JSON format)>

I tried

const dummyHtml = '<p><strong>hello</strong>! <em>How</em> are you</p>';
const testDoc = DOMParser.fromSchema(schema).parse(dummyHtml);

but the testDoc only registers the p tag, the content object is empty I also tried to first convert the HTML into a doc (using JSDOM)

const dummyHtml = '<p><strong>hello</strong>! <em>How</em> are you</p>';
const { document } = new JSDOM(dummyHtml).window;
const testDoc = DOMParser.fromSchema(schema).parse(document);

But that didn’t seem to help either

DOMParser.parse expects the Element value that contains the content you want to parse, so neither a string not a window. I’m not that familiar with JSDOM, but maybe .document.body works there.

We are using JSDOM for some unit tests here. Does first creating a dom element end then setting innerHTML work for you? (see link)

Have a look at the generateJSON utility we wrote for tiptap:

It’s using hostic-dom, which based on some Vue 3 internals.

It takes a string and outputs JSON, the opposite way (generateHTML) can be found in the same folder. Works in the browser context and in Node.js.

Hope that helps!