So the initial ad-hoc behavior I programmed turned out to be woefully inadequate (to my defense, it wasn’t much worse than that in many widely used WYSIWYG tools). But I wanted to define these in a way that actually followed some principle more rigorous than ‘I guess this is sort of what other tools do’.
So I searched for literature on the subject. This was actually really hard to find—either I didn’t find the right jargon, or academics are so embarrassed by WYSIWYG-style editing interfaces that they refuse to touch them. But I did find a few useful papers, all of them by the Inria people (most norably Irène Vatton and Vincent Quint) working on a series of semantic editors: Grif, Thot, and Amaya. Especially this section of the Thot manual and section 4.3 of this paper on Amaya were inspiring.
Enter
Enter takes on various variations of the theme ‘split’. In the simple case of the cursor being in the middle of a textblock, it will split the textblock in two at the cursor position. If the cursor was at the end of the block, the newly split off block will be a regular paragraph, rather than inheriting the parent block’s type.
(Note that all these behaviors occur only as allowed by the schema – if the schema forbids the resulting document shape, the behavior will be suppressed and the next possible behavior will be tried.)
Enter at the start of a textblock does not split that block in two, but rather splits that block off from its enclosing block. If it has a sibling before it, the enclosing block is split above the cursor’s block. If not, the current block is lifted out of the enclosing block. If it has no enclosing block, I do currently fall back on the ‘split paragraph’ behavior (in this case, split off an empty paragraph). I’m still debating whether to do something else here. One option would be to do nothing, another would be to create that empty paragraph and immediately move the cursor into it.
So the typical illustration of this behavior is that you are in a list item (using a document schema that allows multiple blocks per list item), and you press enter. The first thing that happens is that you get a new paragraph inside of your current list item. (This is somewhat non-standard – most tools will immediately create a new list item. And many of them make it quite hard or even impossible to put multiple paragraphs inside of a single item.) To get a new list item, you have to press enter again, which will split the original list item above the new paragraph. Pressing enter again will lift your new paragraph out of the list. I think this is a rather nicely predictable (once you’ve seen it a few times), linear progression.
There’s a special case when the cursor is inside a block type that’s marked as containing code (such as code_block). Since newlines in such blocks are meaningful, enter will insert a newline character. At the end of a code block, enter does move to the next paragraph (but you can use shift or ctrl-enter to insert a newline instead).
Backspace
Backspace, then, is the ‘join’ operation, i.e. the inverse of Enter. Sometimes. At other times, such as when something is selected or the cursor is after text, it obviously deletes the selection or the character before the cursor.
But when pressed at the start of a textblock, it removes one ‘barrier’ between this textblock and the preceding node, where ‘preceding node’ is defined as either its preceding sibling, or if it’s a first child the sibling before its parent, or failing that the sibling before its grandparent, and so on.
If this preceding node is a leaf node, for example a horizontal rule, which can’t have child nodes, we can’t join with it, so we simply delete it. If there is no preceding node, we also can’t join, and we perform a ‘lift’ (moving out of a parent node) if possible.
If there is a non-leaf preceding node, we try to move the node with the cursor closer to it. If that node, or one of its ancestors, can be joined with the preceding node, we do that. Joining paragraphs together is a case of this, but it can also be used to join two adjacent lists or blockquotes together. Failing that, f the node with the cursor is directly after the preceding node (with no wrapping nodes around it), we try to move it into the preceding node. This allows you to backspace a paragraph after a list into that list (without immediately joining it with the last paragraph in the list).
Failing that, if the node with the cursor is wrapped relative to the preceding node, we again try to lift it up out of its wrappers. This means that if joining with or moving into a preceding node isn’t possible, backspace has the visually expected effect of rubbing out the nesting before the cursor.
The main departure relative to the convention used by most tools is that this approach to backspace will join at any block level, not only at the textblock level. I think there is value in having an intermediate step, which allows moving things into blocks without losing them as separate blocks. If you do want to join to the preceding textblock, pressing backspace again will do so.
Delete
Delete acts like a sort of inversed backspace. It also deletes the selection, if any, or the character after the cursor. If those don’t apply, it looks for the block after the cursor, and again deletes that if it is a leaf block. If not, it applies the algorithm used by backspace to try and ‘pull’ the succeeding block closer to the cursor. This might pull a paragraph into a list, or might join a directly adjacent paragraph with the current textblock.