An extra is added in certain pasted content

Almoglem · July 6, 2022, 1:44pm

Hello

I have noticed that, when pasting text from certain sources that contain a hard break, an extra   is added.

For example, when pasting text from google docs with one hard break between each paragraph, the hard break will be parsed into a  with 2   insides, one with “ProseMirror-trailingBreak” and one without.

from google docs:

in prosemirror:

inspecting the dom:

Another thing i’ve noticed about it, i’m pretty sure that it happens where the   in the pasted HTML is not wrapped in a paragraph, then prosemirror will wrap it in a  and add another  . if the   on the pasted html is already wrapped with a , it doesn’t happen.

I’d love to know if there is a way to prevent the extra  . thank you!

MarMun · July 8, 2022, 8:14am

Interesting.

I can reproduce that on the prosemirror.net example implementation (both Firefox and Chrome).

I can’t reproduce that on my own implementation.

My best guess is this: My schema does not have this rule (from schema basic):


  hard_break: {
    inline: true,
    group: "inline",
    selectable: false,
    parseDOM: [{tag: "br"}],
    toDOM() { return brDOM }
  } as NodeSpec

marijn · July 8, 2022, 10:17am

It’s parsed as having only one hard break. The second is only added in the view to make sure it renders properly and allows a cursor to appear after the break, it’s not part of the document.

Almoglem · July 8, 2022, 1:49pm

Thank you, I’ll try to make this change as well!

Almoglem · July 8, 2022, 2:00pm

But it does have an affect- the 2 hard breaks are creating 2 empty lines instead of one, I can put my cursor on each one separately, which is the behavior I do not want.

(the view.state.schema actually does not include the extra hard breaks, but they are rendered in the dom)

this is what I mean by the 2 lines:

For now, I have used the Element.remove() method to remove all hard breaks without the prosemirror class on handlePaste() directly on the view.dom, and it is giving me the desired behavior, but I feel as it is not the best practice (documentation says its probably not a good idea to make changes directly on the view.dom)

andrews · July 8, 2022, 4:39pm

I think the issue comes up when the source html in the clipboard has br’s sibling to p’s. So the google docs content has this

When copied to the clipboard the clipboard has Line1Line2 AfterBlank and the br gets parsed as  . So it looks like 2 lines instead of the intended single line (perhaps just as .

amk221 · July 10, 2022, 7:06pm

When pasting, if you come across a node with a single hard break in it, e.g.   remove that br like:

function transformPasted(schema) {
  function recurse(item) {
    if (item instanceof Fragment) {
      const nodes = item.content.map(recurse);
      return Fragment.from(nodes);
    } else if (item instanceof Node) {
      const fragment = recurse(item.content);
      let node;

      if (
        item.type.isBlock &&
        item.content.size === 1 &&
        item.content.content[0].type === schema.nodes.hard_break
      ) {
        node = item.copy();
      } else {
        node = item.copy(fragment);
      }

      return node;
    }
  }

  return function (slice) {
    const fragment = recurse(slice.content);
    return new Slice(fragment, slice.openStart, slice.openEnd);
  };
}

…you will end up with

 

instead of

 

No matter how hard I try, I do not understand this explanation:

The second is only added in the view to make sure it renders properly

…afaics we don’t need another one just to see the first one - because we can already see the first one.

andrews · July 11, 2022, 2:11am

A   should have 2 lines as the paragraph implies one line and the br represents a hard break so visibly that would/should look like 2 lines. The ProseMirror-trailingBreak is added in for an empty paragraph to ensure it does not collapse. The issue here is that google docs probably should have included an empty paragraph (without the br) -or- a br within the paragraphs it included. Instead they put a   sibling to the . A br is an inline in prosemirror’s nodes and so it ends up generating a containing block here - the . So if you wanted to work around this correctly then would have to do this for that scenario and therefore inspect the html not the resulting fragment/nodes. Because if you instead used Word and had a break within a paragraph you’d lose the explicit break. It would be best if prosemirror could handle this but since technically it doesn’t know about the node types (you supply the schema) I doubt it can handle this.

Almoglem · July 11, 2022, 6:40am

  actually gives me the desired result of only 1 line, but an unwrapped   would be wrapped in a  with another   added to it

Almoglem · July 11, 2022, 6:49am

Thank you! but transformPasted actually receives a Slice as an argument, can you direct me how to use it the way you did?

amk221 · July 11, 2022, 8:02am

@Almoglem Sorry, it was closed over… just remove the nested return function so it accepts a slice. (It’s because clipboard handling as a plugin)

export default function clipboard(schema) {
  return new Plugin({
    props: {
      clipboardTextParser: clipboardTextParser(schema),
      clipboardSerializer: clipboardSerializer(schema),
      transformPasted: transformPasted(schema)
    }
  });
}

prosed · July 29, 2023, 8:05pm

This doesn’t appear to be the case. I’m seeing this when transforming plaintext into links upon paste.

andrews · July 31, 2023, 12:59pm

It’s difficult to comment without specific steps to be able to see the issue. Perhaps try stepping through the code to see where the issue you are having arises?

stijn · October 19, 2023, 7:46am

I ran into the same problem. I can recreate it doing the following :

create a new paragraph
add a few with shift + enter
go back up one line with the up-arrow
type something
go back to the bottom of the line
hit enter (creates a new paragraph)
type something

the result will look something like this:

when parsed with DOMSerializer, the Prosemirror-trailingBreak gets removed, meaning the before also doesn’t get shown. Resulting in something like this:

Arguably the Prosemirror-trailingBreak shouldn’t get removed in this specific situation.

marijn · October 19, 2023, 9:15am

No, this is working as expected. We need the dummy   during editing, to make it possible to type after the   before it. But it’s not part of the document, so it shouldn’t be serialized.

stijn · October 19, 2023, 9:33am

Understandable. However, the dummy does create a discrepancy between how the text looks in the editor and how the serialised content looks. Do you have any suggestions on how to solve this?

patman · January 31, 2024, 1:51pm

is there any workaround here? I don’t understand how people are able to copy/paste text with multiple new lines into the editor? There is always an extra 1
tag added. So for n number of new lines in a row, I have n+1 new lines. I tried hiding the extra
with “display: none”, but that creates other issues which take more work to fix. Would be great if someone has a solution

erjonbit · May 5, 2025, 1:25pm

add to css :p:empty::after { content: ‘’; display: block; height: 1em; /* or more */ }

An extra <br> is added in certain pasted content