Is DOM/HTML format losless?

Hey, we have been the toDOM exporter to export the doc when saving and fromDOM when importing into a collaborative editor. This worked well in 99% of cases, but there were always some cases that didn’t work quite right. Yesterday I was able to find at least one cause for errors: the trimming of white space at import. Calling fromDOM with the preserveWhitespace option set made that cause go away.

But now I am wondering if exporting/importing using fromDOM is meant to be lossless, or whether it may truncate other things than whitespace as well.

Good question. It should be, but I was also surprised by the whitespace issue when it caused a problem in the way input events are handled, so this isn’t something I had considered deeply. If you run into similar issues, let me know. Right now, my thinking is that there are two modes – parse HTML from an outside source, in which case we should do things like normalize extra whitespace, or the resulting doc will contain superfluous space, or parse HTML that we ourselves produced, in which case the whitespace should be preserved.

1 Like