Paste behavior research

ProseMirror’s behavior when pasting content, despite already being scarily complicated, doesn’t seem to correspond to common user expectations (#397). So I’m trying to rethink it from first principles, in the hope of coming up with a somewhat coherent, yet user-acceptable model.

In this context, I’ve been playing with Google Docs, LibreOffice, and MS Word Online (didn’t have a copy of real MS Word around) to see how they behave. I was not able to find any kind of coherent behavior in the latter two (which doesn’t mean it isn’t there, just that I wasn’t able to figure out the model they use), and I often strongly disliked how they behaved, so for now I’m not using them as examples. Google Docs, on the other hand, does seem to follow a relatively clean and predictable model. As a starting point for a discussion on improving our copy/paste behavior, I’m going to describe what I believe that model is.

A Google Docs document appears to basically (with some exceptions, such as tables) be a flat list of blocks. Most blocks hold inline content, and each block has a number of properties, such as whether it is a list item, how deeply indented it is, how it is aligned, and so on. There are no such things as multiple paragraphs per list item, or even nested lists, but you can represent something like them by indenting paragraphs and lists below a list item.

The content of the clipboard is treated as a sequence of such blocks. The clipboard data saves the properties of the blocks in it, even if the selection was just a single word inside of a block.

When pasting, this appears to happen:

  • The content of the first clipboard block is inserted into the destination block. If that destination block is an empty plain paragraph with any special properties, it gets the properties of the pasted block.

  • Remaining pasted blocks, if any, are inserted after the destination block. If that block was a plain paragraph, they retain their properties, otherwise they take the properties of the destination block. If there was content after the selection, that content is appended to the last pasted block. [on closer look, this only appears to happen when they are all of the same type, if not, see below]

  • [added on edit] When pasting a range of blocks that don’t all have the same properties, the block’s properties seem to be preserved, even when pasting into a non-default block.

So pasting multiple bullet list blocks into a numbered list creates numbered list items, but pasting them into a plain paragraph creates bullet list items.

There are probably some special cases and exceptions that I didn’t describe, but this simple model seems to predict most of the editor’s behavior. So that’s a nice starting point! But the question is whether a model like that can be ported from a flat content model to a tree-shaped one.

So that’s what I’m thinking about now. Will add more posts in this thread when I come up with something. In the meantime, feedback and ideas are appreciated.

1 Like

I’d also like to take this opportunity to invite everyone who feels they have a mental model of how pasting should work to do their best to describe their model.

Specifically, does your mental model include a concept of whether the nodes at the start and end of your selection are ‘open’? When you select all text in a heading and copy, should that produce the same clipboard data as when you have selected the whole heading node? And when one side of the selection is in a node that’s nested more deeply than the other side, should this be noticeable in some way when you paste that content, or isn’t that something that you consider significant?

Tentative answer so far: not at all.

Most of the hard questions in a tree model, such as at which depth to start copying or pasting, simply don’t come up in a flat model. Also, ProseMirror should really be context-sensitive in how it pastes. I.e. in Google Docs, if you copy a thrice-indented (nested) list item and then paste it somewhere where there are no lists nearby, it is still thrice-indented, which would be unacceptable (not to mention in violation of the schema) in ProseMirror, where you’re moving pieces of a semantic tree, as opposed to styled blocks of text.

So though we can use this model as a useful way to think about user expectations, and try to have ours behave similarly in similar cases whenever practical, we don’t get much from its underlying principles.

Personally, I’m in love with an idea of a non-WYSIWYG editor that displays markup explicitly, like stars in Markdown for bold text. Copy/pasting list items is an example of where such editor would shine. You would display ul/ol tags, and li tags for each list item. The user would then be able to copy either just the list items, or the whole list with ul/ol tags. When pasting, the only complication would be that you’d have to automatically create the enclosing ul element when the user pastes plain li items. I hope I’m not duplicating some prior discussion, but I’d be very interested to hear your thoughts on such an editor, and maybe you can point to any existing discussions/implementations.

PS I can see that it would not be straightforward to implement — for example you have to prevent the user from selecting an opening tag without selecting the whole markup region.

That is simply not the kind of editor ProseMirror is, in that it aims to be usable by people who don’t do markup syntax, and generic when it comes to schemas (i.e. not a Markdown-specific editor).

Thanks for quick reply. Well you see, I don’t mean an editor with a syntax, I mean just displaying the boundaries of markup regions. You can have some graphical elements or tags saying “bold”, that doesn’t matter. You’d add the markup using the same buttons or keyboard shortcuts.

I’ll give you another interesting example — the heading levels. I think that’s a big pain point for users because you have to choose right off the bat whether to use h1, h2 etc., and then when copy-pasting inside another section, you have to manually adjust heading levels.

The natural solution to this would be automatic heading levels along the lines of the HTML <section> tag:

<h1>Heading</h1>
<p>Paragraph</p>
<section>
  <h1>Displayed as h2 in the browser</h1>
  <p>Paragraph</p>
</section>

This wouldn’t be possible in a WYSIWYG editor (and in Markdown which is showing its WYSIWYG side here), because there is no way for the user to express where the section ends so that we could determine the nesting level.

I think these markers would also be good for extensibility — as the sections example shows, sometimes you want the user to be able to specify markup that cannot be easily represented visually — think about a GUI template editor where you want a “forEach” tag.

I like the idea of taking Google Docs as a reference for user expectations. During my research with different editors I also felt that it uses one of the most coherent and predictable models. The behavior you describe is similar to what I came up with in the issue you linked to, but there is one thing I would like to add/clarify:

I would say that the empty paragraph not only gets the properties of the pasted block, but that it gets replaced with or that it becomes the pasted block. This is possibly the same as what you mean depending on how you define what a property is (When I first read it I thought of a property more like a node attribute and not its type).

I think that it should produce the same clipboard data. When working with standard cursor motions (e.g. moving the cursor, selecting text) the user does not really have a way to differentiate between selecting a whole text node or selecting the content of a text node. So when a user wants to select the whole header, the most straight forward way to do this is to simply select all of its content.

I have a hard time thinking about what the implications are with this one. Do you have any example of how two different approaches would behave? Intuitively I would say that it is important to keep the relative depths of a parent-child relationship if the selection ends in the same parent that it starts in.

For whole-node manipulation, ProseMirror does provide node selections, and I can think of cases where both variants (copied text ranges and copied nodes) should probably behave differently. If you select all text in a textblock and copy, and then paste that in the middle of some other textblock, I’d expect only the text to be inserted. If you select a whole node and copy-paste that into the middle of some textblock, I’d expect the target textblock to be split and the whole node inserted between the halves.

I’m having trouble coming up with a relevant example. Say, you have a document p("he>llo"), blockquote(p("qu<ote")), with the >< indicating the selection. If you copy that, you get a clipboard fragment like p("llo"), blockquote(p("qu")). And if you paste that, I guess you’d expect both the starting paragraph and the end paragraph to be joined to the content before/after the cursor. With our selection model, it’s not really possible to have a selection that ends after the second paragraph (in which case you could argue that that paragraph is ‘closed’ and should not be joined). This might simply be moot.

That makes sense! But copying and pasting all text in a heading into an empty paragraph should not only insert the text, but the complete heading. So for this case the generated clipboard data should contain the information about the heading and not only contain the text content. From this perspective the same data ends up in the clipboard: h("all text"). But depending on the context of where it gets pasted, the same clipboard data is treated differently. Does this make sense?

It’s good that ProseMirror has this, but overall this doesn’t exist much on the web, so it may be difficult to explain this to users. Take for example this:

<h1 style="text-transform:uppercase;">The title</h1>

If one just selects the text, a browser like Firefox will likely copy just the text “The title” and if the user pastes it somewhere, it will be inline, without text transformation. But if the user selects a little bit more, for example the by selecting into some spaces at the end of the previous paragraph, it may end up copying (and consequently pasting later on) the entire original h1-element, and the paste output will be entirely different.

Personally I would like to have that level of control, but if it’s only ProseMirror that allows such fine-grained selections, then it may be difficult to make end-users use this effectively.

As for the merging strategies: your proposals generally sound like they make sense. Nevertheless, we maintain special paste handlers for specific known sources (Google Docs, Word, etc.) and we probably will continue doing that, just because there is a lot of information in the original source data that one can only translate adequately if one targets one specific document schema.

Yes, I’ve got that as one of my requirements.

1 Like

Have you seen that since 0.11 you can pass a custom parser for parsing clipboard content to the editor? I know you’re using 0.10 for now, but I would be interested in how many of your custom behavior could be expressed as parser rules, and whether they suggest any extensions to the functionality of the parser.

No, I did not see this. I’ll have to look more into it to understand how much more control it gives than the pm.on.transformPastedHTML that we have in 0.10.

Here is another interesting use case which this time involves cut&paste and schema constraints: Let’s say that we have a schema with headings, bullet lists, and list items. The schema allows a list item to only contain paragraphs and bullet lists (content: "(paragraph|bullet_list)+").

We have a document h(">Heading<"), ul(li(p(""))), with the >< indicating the selection. If you cut that, you should get a clipboard fragment like h("Heading"). If you paste that into the empty paragraph inside the list item, I would expect the document p(""), ul(li(p("Heading"))) as a result.

Without the schema constraints the empty paragraph inside the list item should have been replaced with the heading. But as the schema prevents the list item from containing a heading, the heading’s text content has been joined with the empty paragraph.

Also note that I would suggest to turn the heading that was cut, into an empty paragraph. This is something which is handled differently in Google Docs, where the heading is turned into an empty heading. At the same time, cutting a complete bullet list in Google Docs, results in an empty paragraph which is not consistent with the behavior that it uses when cutting a heading. To summarize the cut behavior that I would suggest: Cutting a node (either by directly selecting the node or by selecting all of its content) should always result in an empty paragraph at the node’s position. I know that there is no obvious right or wrong for how to handle the cut, but this approach looks to me like it would hopefully not surprise users and be consistent at the same time.

I don’t think I agree. If you remove the text from a node, by cutting or by backspacing or in whichever way, there’s no reason for that to change the node type.

As for the schema constraint issue, I’m aware of it (and indeed, scheme constraints don’t make this problem easier, but I think I’m making progress).

I don’t have a strong opinion on this one and we haven’t done any user testing on this so I don’t know what the “better” approach is. I think both approaches should work well. And your approach has the advantage of being only “one backspace away” from the alternative behavior in case the default does not do what the user expected/wanted. With my suggestion the alternative result would require the user to create the correct node type again (usually using a menu).

Agreed. If I highlight all text and then type, I’d expect to still be in the same node. It’s not perfect as this can lead to empty headings but I think that oddity is more straightforward and less magical from a user perspective.

Would it help if we came up with a set of example scenarios that we think fairly cover the space of possibilities?

Few thoughts: Favor converting first pasted node to adhere to scheme as appose to introducing new node. So pasting a heading in a list item would convert it to a paragraph instead of popping out of list and inserting a heading. Empty paragraphs should always get replaced when valid. I believe this to be “right” because if the user wanted the heading they would be on a new toplevel empty line.

Pasting multiple nodes within a paragraph. I could see wanting one of two options: A. Convert everything to textnodes/paragraphs (like pasting with removing markup but maybe keep marks). Even though user could use the hot keys to remove markup, this is probably what they want. Otherwise they would have created a new line. B. Convert first and ?maybe? last nodes to join with surrounding paragraphs but split as to insert inner nodes as is.

Sounds good to me.

Personally I prefer the idea of trying to retain the copied structure as much as possible. If a user copies for example multiple paragraphs and nested bullet lists, I do not see a good reason to turn the bullet lists into a flat series of paragraphs when pasting into some text node. In this case I actually rather like ProseMirror’s current behavior. I also tested this in Google Docs and Libre Office, which both try to retain the structure.

I think that something like this could work well and it looks like it would behave similarly to Google Docs (which is a good thing in my opinion). I think this is also similar to what ProseMirror is already doing now.

Yes, that’s what the Transform.replace algorithm already does, and I’ll probably be able to let it handle most of the schema-fitting logic for pastes.

I considered this, but I think if there is further-nested markup in the copied range (say, you copy three blocks and the middle one is a blockquote) I think users expect that inner markup to survive.

Probably best to keep the markup (less magical). Was thinking about this use case and it’s definitely an edge case to be pasting multi node content within a paragraph.

I’d say most pastes will be single paragraphs / lines of text, that the user could paste anywhere and will expect it to insert the text within the current node. OR multi node pastes being pasted at an empty line, where the user will expect keeping the markup with the “best” conversion to the schema and replacing the empty line.