Non-breaking spaces being added to pasted HTML

I’ve noticed recently that whenever I copy/paste HTML into my editor, text nodes with marks render with non-breaking spaces on either side of the element. So if I’m pasting in a paragraph with italic text, it renders as follows:

<p>Lorem ipsum dolor sit amet,&nbsp;<em>consectetur adipiscing</em>&nbsp;elit.</p>

Initially I assumed it might have been something in my code causing this but then I looked at one of the examples on the ProseMirror site and it seems to occur there as well.

It’s not a major issue for me, and I only ever discovered it when I was viewing my editor in a mobile window and noticed some funny line breaks. In any case, I was wondering if there was a way to prevent this from happening. I’m also willing to accept that there may be some technical reason they should be added that I haven’t yet considered.

This is not intentional behavior, and probably caused by the browser’s handling of clipboard data. Which browser is this happening in (I’m not seeing it in Firefox)?

So I first noticed it in Chrome. I tried it in Firefox and Safari and at first I didn’t see it there as well when I went to inspect the element. However, I then tried viewing some HTML documents that I knew had non-breaking spaces and I also didn’t see them in Firefox when I went to inspect. But the non-breaking behavior was still present in those browsers (words were still sticking together in the places where the non-breaking spaces were). I’m probably missing something here, but I suspect that Firefox and Safari may not be showing the actual &nbsp; entities in their inspectors. When I copy the same HTML out of those browsers and into a text editor I see the &nbsp; entities again.

So it looks like it is indeed caused by the browser’s handling of clipboard data. I was initially confused because I tried to use the transformPastedHTML method to see if I could clean up the HTML before it got parsed, but I couldn’t spot the non-breaking spaces in the raw HTML so I assumed they were being added afterward somehow. It turns out I was just failing to match on them correctly. I was able find and replace them with the following:

    transformPastedHTML: function(html) {
      return html.replaceAll(/\u00A0/g, " ");

Anyways, sorry for the false alarm!

You can reliably locate non-breaking spaces with view.state.doc.textContent.replace(/\u00a0/g, "!!").

The issue appears to be that Chrome and Safari mangle content they put on the clipboard by replacing some spaces in the actual copied content with non-breaking spaces. By the time ProseMirror sees the HTML, it can’t reliably see anymore whether a given non-breaking space is intentional or inserted by the browser.

It seems CKEditor has a hack to replace all spans with a single non-breaking space in them, and optionally an Apple-converted-space class (Safari seems to label these spaces, Chrome unfortunately doesn’t). This patch copies that approach.

Yup, I realize now that my not being able to properly spot the non-breaking space was the issue.

Also, thanks for creating this incredible library!