An extra <br> is added in certain pasted content

Hello :slight_smile:

I have noticed that, when pasting text from certain sources that contain a hard break, an extra <br> is added.

For example, when pasting text from google docs with one hard break between each paragraph, the hard break will be parsed into a <p> with 2 <br> insides, one with “ProseMirror-trailingBreak” and one without.

from google docs:

in prosemirror:

inspecting the dom:

Another thing i’ve noticed about it, i’m pretty sure that it happens where the <br> in the pasted HTML is not wrapped in a paragraph, then prosemirror will wrap it in a <p> and add another <br>. if the <br> on the pasted html is already wrapped with a <p>, it doesn’t happen.

I’d love to know if there is a way to prevent the extra <br>. thank you!

2 Likes

Interesting.

I can reproduce that on the prosemirror.net example implementation (both Firefox and Chrome).

I can’t reproduce that on my own implementation.

My best guess is this: My schema does not have this rule (from schema basic):


  hard_break: {
    inline: true,
    group: "inline",
    selectable: false,
    parseDOM: [{tag: "br"}],
    toDOM() { return brDOM }
  } as NodeSpec
2 Likes

It’s parsed as having only one hard break. The second is only added in the view to make sure it renders properly and allows a cursor to appear after the break, it’s not part of the document.

1 Like

Thank you, I’ll try to make this change as well!

But it does have an affect- the 2 hard breaks are creating 2 empty lines instead of one, I can put my cursor on each one separately, which is the behavior I do not want.

(the view.state.schema actually does not include the extra hard breaks, but they are rendered in the dom)

this is what I mean by the 2 lines:

For now, I have used the Element.remove() method to remove all hard breaks without the prosemirror class on handlePaste() directly on the view.dom, and it is giving me the desired behavior, but I feel as it is not the best practice (documentation says its probably not a good idea to make changes directly on the view.dom)

1 Like

I think the issue comes up when the source html in the clipboard has br’s sibling to p’s. So the google docs content has this

image

When copied to the clipboard the clipboard has <p>Line1</p><p>Line2</p><br><p>AfterBlank</p> and the br gets parsed as <p><br/></p>. So it looks like 2 lines instead of the intended single line (perhaps just as <p></p>.

1 Like

When pasting, if you come across a node with a single hard break in it, e.g. <p><br></p> remove that br like:

function transformPasted(schema) {
  function recurse(item) {
    if (item instanceof Fragment) {
      const nodes = item.content.map(recurse);
      return Fragment.from(nodes);
    } else if (item instanceof Node) {
      const fragment = recurse(item.content);
      let node;

      if (
        item.type.isBlock &&
        item.content.size === 1 &&
        item.content.content[0].type === schema.nodes.hard_break
      ) {
        node = item.copy();
      } else {
        node = item.copy(fragment);
      }

      return node;
    }
  }

  return function (slice) {
    const fragment = recurse(slice.content);
    return new Slice(fragment, slice.openStart, slice.openEnd);
  };
}

…you will end up with

<p><br class="ProseMirror-trailingBreak"></p>

instead of

<p><br><br class="ProseMirror-trailingBreak"></p>

No matter how hard I try, I do not understand this explanation:

The second is only added in the view to make sure it renders properly

…afaics we don’t need another one just to see the first one - because we can already see the first one.

2 Likes

A <p><br></p> should have 2 lines as the paragraph implies one line and the br represents a hard break so visibly that would/should look like 2 lines. The ProseMirror-trailingBreak is added in for an empty paragraph to ensure it does not collapse. The issue here is that google docs probably should have included an empty paragraph (without the br) -or- a br within the paragraphs it included. Instead they put a <br/> sibling to the <p>. A br is an inline in prosemirror’s nodes and so it ends up generating a containing block here - the <p>. So if you wanted to work around this correctly then would have to do this for that scenario and therefore inspect the html not the resulting fragment/nodes. Because if you instead used Word and had a break within a paragraph you’d lose the explicit break. It would be best if prosemirror could handle this but since technically it doesn’t know about the node types (you supply the schema) I doubt it can handle this.

1 Like

<p><br></p> actually gives me the desired result of only 1 line, but an unwrapped <br> would be wrapped in a <p></p> with another <br> added to it

Thank you! but transformPasted actually receives a Slice as an argument, can you direct me how to use it the way you did?

@Almoglem Sorry, it was closed over… just remove the nested return function so it accepts a slice. (It’s because clipboard handling as a plugin)

export default function clipboard(schema) {
  return new Plugin({
    props: {
      clipboardTextParser: clipboardTextParser(schema),
      clipboardSerializer: clipboardSerializer(schema),
      transformPasted: transformPasted(schema)
    }
  });
}
1 Like

This doesn’t appear to be the case. I’m seeing this when transforming plaintext into links upon paste.

1 Like

It’s difficult to comment without specific steps to be able to see the issue. Perhaps try stepping through the code to see where the issue you are having arises?

2 Likes

I ran into the same problem. I can recreate it doing the following :

  • create a new paragraph
  • add a few </br> with shift + enter
  • go back up one line with the up-arrow
  • type something
  • go back to the bottom of the line
  • hit enter (creates a new paragraph)
  • type something

the result will look something like this:

when parsed with DOMSerializer, the Prosemirror-trailingBreak gets removed, meaning the </br> before also doesn’t get shown. Resulting in something like this: image

Arguably the Prosemirror-trailingBreak shouldn’t get removed in this specific situation.

No, this is working as expected. We need the dummy <br> during editing, to make it possible to type after the <br> before it. But it’s not part of the document, so it shouldn’t be serialized.

Understandable. However, the dummy <br> does create a discrepancy between how the text looks in the editor and how the serialised content looks. Do you have any suggestions on how to solve this?

is there any workaround here? I don’t understand how people are able to copy/paste text with multiple new lines into the editor? There is always an extra 1
tag added. So for n number of new lines in a row, I have n+1 new lines. I tried hiding the extra
with “display: none”, but that creates other issues which take more work to fix. Would be great if someone has a solution