Paste behavior research

wes-r · November 3, 2016, 7:03pm

I think I miss handling of html directly. One thing I want to implement is introducing blank paragraph nodes in the case where the pasted html doesn’t have them and instead uses margins or paddings.

kiejo · November 4, 2016, 7:15am

I’m in a similar position and need a way to transform HTML before passing it to the schema’s DOMParser. My use case is cleaning up HTML clipboard data, which is produced by Word. In this case it is a lot easier for me to work with HTML as I’m reusing code from another library (GitHub - uq-eresearch/unstyler: CoffeeScript library for removing style bloat from pasted MS Word HTML).

johanneswilm · November 4, 2016, 8:10am

So there is no equivalent of pm.on.transformPastedHTML in 0.11/0.12? That doesn’t sound good.

wes-r · November 4, 2016, 12:55pm

We instead have clipboardParser and transformPasted. The first being a DOMParser and the second taking a slice and returning a slice. I think the only thing missing that I’d like is something to replicate transformPastedHTML (takes html and returns html).

marijn · November 8, 2016, 4:02pm

What kind of things are you doing with this? There’s really quite a lot that can be expressed with DOM parser rules (and you don’t have to base your parser on your schema’s node specs).

wes-r · November 8, 2016, 11:35pm

I think DOMParser takes care of the one-to-one cases, but what about adding or removing nodes? transformPasted can do this but only based on the already converted nodes.

My paragraphs don’t have bottom margins and instead empty paragraphs can be used as a separator. If a user copies html from a site that uses margins instead, I’d like to be able to detect this and introduce empty paragraphs. That way the look is consistent.

marijn · November 10, 2016, 4:19pm

I’ve pushed a bunch of patches today that I think move ProseMirror’s paste behavior closer to something that’ll be acceptable to users.

As I hinted at before, the problem really is much harder when your content is not flat. Some questions, such as ‘which markup applies to the copied content’ and ‘what parent nodes can be dropped during paste’ are pretty much trivial in a flat model (exactly the parent block), but have no obvious answer in a tree.

Since ProseMirror’s core is entirely schema-agnositc, and doesn’t know anything about the document except that it is some kind of tree, I ended up introducing a new flag on node types which helps with this decision. I called it defining, for lack of a better name (suggestions welcome). Its meaning is something like “this node should be taken into consideration when deciding what its content means”.

When you copy text from a parent node marked as defining, and paste it somewhere where that node fits, the text is wrapped in that parent. By marking headings, list items, and blockquotes as defining, copying from them will carry the list or quote parent along, if circumstances allow.

If you paste text from a list item into a non-empty textblock, we just want to insert the text, not the whole list item. So whether the parent context is used depends on the target position as well. It is only used when it, or one of its ancestors, fits at the insertion point. But in order to determine the insertion point, the defining flag is used again. If you paste into a node that is entirely covered by the selection (for example into an empty textblock), and that node is not defining, the replace algorithm will prefer to replace that whole node. If it is defining, it will prefer to insert into it. (So if you copy into a header, it stays a header, just like in Google Docs.)

Since the replace algorithm needs a little leeway in being able to find a fitting position for a given inserted piece of content, the above logic only determines the preferred place, that is tried first. When that doesn’t fit, it will continue trying to include extra parent nodes of the inserted content, and then (in an outer loop) it will continue trying to overwrite more completely covered parent nodes of the insertion position (so it will replace an empty heading when the content can’t be inserted into the heading itself, for example if you’re pasing a horizontal rule node).

This works pretty well. I added a bunch of test cases to cover most of the examples discussed in this thread, and I factored the replace logic into a number of new Transform methods so that other code that needs a “do what I mean” style replace can also call it.

For pasting plain text, I changed to a model where every line gets its own paragraph (again following Google Docs) rather than only splitting on double newlines, and I added a special case for when you pasted into a block that’s marked as code (in which case the text gets inserted verbatim). I’m not sure this is perfect, but it’s a lot better than what we had before.

I invite everyone to try the new code. I’m planning to cut a 0.13 release tomorrow, so if you want to get the code from NPM you can just wait for that. (But note that setting up a checkout of the code with a demo is very easy if you use the scripts in the prosemirror repo.)

(I also restored transformPastedText and transformPastedHTML.)

kiejo · November 11, 2016, 9:54am

Awesome!! This works so much better than what we had before. Thanks for the hard work And I’m happy you restored transformPastedHTML.

I checked out the repo and tested it and these are the things I noticed so far:

Pasting text into text with strong/em/code marks does not keep the marks. Pasting text into text with a link mark does keep the mark. I would prefer making the other marks behave like the link mark. (How) is this customizable? Is this current difference in behavior by design?
Pasting text into text with a link mark and then pressing Ctrl+Z results in an incorrect cursor position.
I think that pasting into a defining code block should always use the text from the clipboard and completely ignore the html (even in empty code blocks). It should for example be possible to copy and paste code from GitHub into a code block (currently this creates a table as this seems to be the html that is generated by GitHub).
I would suggest marking table cells as defining in the default schema-table as this is more in line with what users would expect when for example pasting a bullet list into a table cell (currently this splits the table).

I found a reproducible uncaught error Cannot read property 'isInline' of null. In the demo do the following:

Select and copy the text “Ordered lists (such as this one)” in a way which makes the selection look like there is a space at the end of the text. You can do this by moving the mouse a little further down while selecting the text or by putting the cursor at the start of the list item and pressing Ctrl+Shift+Down (Linux + Chromium).
Trying to paste this anywhere in the document results in the uncaught error Cannot read property 'isInline' of null.

marijn · November 11, 2016, 3:23pm

This was a bug. Handling of events within links was broken. It should always keep the original markup, I think.

Good idea. Implemented.

That doesn’t really help. I don’t have time to get to the bottom of this right now but I’ll investigate why this behaves like that next week.

Should be fixed now.

marijn · November 14, 2016, 10:00am

This patch should address this behavior (the list now ends up inside the table cell, which makes more sense)

kiejo · November 14, 2016, 3:45pm

Great, thanks for fixing these so fast! I tried the latest version from master and everything seems to be working nicely now

matthieubellon · November 14, 2016, 3:46pm

One thing Prosemirror was shining at is handling paste from external source (websites, Word) and doing a great job at it (especially word)

It lost it when we migrated to 0.13. I checked on prosemirror demos and reproduced it : http://recordit.co/d0sCLb5BZj

Console log : `bundle_basic.js:9719 Uncaught TypeError: Cannot read property ‘push’ of null(…) (anonymous function) @ bundle_basic.js:9719 forEach @ bundle_basic.js:3655 loop @ bundle_basic.js:9714 normalizeSiblings @ bundle_basic.js:9726 fromClipboard @ bundle_basic.js:9689 handlers.paste @ bundle_basic.js:12007 (anonymous function) @ bundle_basic.js:11634

marijn · November 14, 2016, 9:11pm

Not lost, just broken a little. Patch 4985040a9c63 seems to enable me to paste from Le Monde again. Do you want to test it out before I tag a bugfix release?

matthieubellon · November 15, 2016, 7:46am

Great news. I thought you changed paste strategy. Please proceed, I won’t be able to test it today and do not want to be a bottleneck. Thanks again Matthieu

marijn · November 15, 2016, 9:13am

Published prosemirror-view 0.13.1 with the fixes.

thomasWajs · November 29, 2016, 3:49pm

Dunno if linked to the paste rework discussed here, but I noticed some weird behaviour with copy-paste. It seems to happen on windows only (we could not reproduce on Linux/Mac).

On the basic demo page ( http://prosemirror.net/demo/basic.html ), I notice that :

I can copy/cut a word from the editor, but I cannot paste it back into the same editor. An action is fired, because the history plugin display the undo button into the menu, but the pasted text does not appears.
If I copy a single word from the introduction paragraph of the demo, then paste it into the middle of a paragraph in the editor, the target paragraph is splitted and a new paragraph is inserted, which contains the single copied word

Hope you have a windows VM somewhere if you want to take a look.

marijn · November 29, 2016, 4:32pm

@thomasWajs Which browser+version is this happening in?

thomasWajs · November 29, 2016, 9:32pm

Firefox 50.0.1 Chrome 54.0.2840.99 m

Both on Windows 7 64 bits

Sorry for the double post with https://github.com/ProseMirror/prosemirror/issues/497, our report crossed

amk221 · August 3, 2021, 6:02pm

It’s great that there are so many hooks to handle pasting text and html, but I agree that there is a gap.

There is no opportunity to add a node. @wes-r is right that by the time you are in the transformPasted hook, it’s too late.

My schema doesn’t have paragraphs. And I’d like the opportunity to add a node after paragraphs.

One could solve this by adding a paragraph to their schema, so that in transformPasted you could then detect a paragraph and do-a-thing. But, that muddies the schema, which itself should not have paragraphs.

Are there any alternative approaches?