Let’s say I am pasting HTML that looks like this
<p>Before Image<img src="someUrl" />After Image</p>
In the schema, I have paragraphs listed in a block group and images listed in a block group. Blocks can only contain inline items. What I expect is a structure that looks like this after parsing
<doc> <p>Before Image</p> <img src="someUrl" /> <p>After Image</p> </doc>
However, the img element is removed when parsing. I dug into the internals of prosemirror-model and it looks like when “findPlace” is called, the img tag can’t be wrapped in anything to be valid inside of a paragraph and it is then dropped. If you force “solid” to be false, it seems to work since it resolves it to the topLevel node (the doc in this case). Ideally, we don’t want any unexpected data loss when parsing HTML.
The only way we can get around this right now when parsing is to
- a) Normalize the HTML/Dom Node before parsing
- b) Prevent the parsing of the paragraph and let the text nodes wrap themselves in their own paragraph when “findWrapping” is called.
- c) Overwrite PM internals
“a” is not ideal because there could be many corner cases that we could miss. “b” is not ideal because we could lose attributes that were part of the paragraph tag. “c” is not ideal because it’s easy to break stuff, especially when we upgrade.
If we could somehow overwrite the “solid” attribute in the NodeContext when parsing the slice, that could solve our problems, but I’m not sure if that would cause any other issues.
Does anyone have any thoughts?