How to parse HTML raw text with ProseMirror?

I want to open edit an HTML file with ProseMirror, but I don’t want to load the html into the DOM for security reasons. Preferably ProseMirror would parse the HTML text into its JSON representation and thus only render secure content.

Is this possible?

I’m seeing that DOMParser only parses HTML nodes.

It looks like prosemirror-markdown uses markdown-it to parse directly into the JSON format… So I guess I should mention that I’m just using the prosemirror-example-setup. Perhaps this doesn’t exist yet?

Thanks!

Browsers can parse HTML securely. Look into createHTMLDocument for creating a detached DOM document.

1 Like

Hmm. I don’t think this prevents from writing a <script> that evaluates within that dom though…

It would have taken you about 10 seconds to experimentally find out that, yes, it does prevent that.

Touché.

I didn’t see any mention of security in that documentation though.

const doc = document.implementation.createHTMLDocument()
const div = doc.createElement("div")
div.innerHTML = `<div>
<p>Hello world</p>
<script>
console.log("hello")
</script>
<div>`

image

The script doesn’t appear to be evaluated, but the script tag is there.

It looks like DOMParser is also an option.

(new DOMParser).parseFromString(`<div>
<p>Hello world</p>
<script>
alert("hello")
</script>
<div>`, "text/html")

Thanks for the help