There was this joke on Twitter a while back, which keeps popping into my head as I consider how to announce yet another large breaking change.
USERS: you’re alienating the people who actually use your product
TWITTER: likes are now florps
USERS: what
TWITTER: timeline goes sideways
(The joke being that the Twitter engineers seem very busy making changes, just not changes that anyone is waiting for.)
In this post, I’ll try to convince you that the thing I’m landing on the master branch today is not a sideways timeline, but an actual improvement. What is it?
Positions are now numbers
The gist of the change is that the Pos
class is gone, and positions in the document are now described by integers. To interpret such integers, you can mentally flatten the document into a token stream, where a token can be a character, the opening of a node, the closing of a node, or an ‘atomic’ (no content allowed) node. Position 0 points at the very start of the document, and each further position is denoted by the amount of tokens that come before it.
Advantages
-
The main motivation for this change was that it makes position mapping much simpler. Whereas it used to be necessary to perform clever path shifting and re-rooting to map positions relative to a change, you can now think of it as simply replacing some stretches of tokens with new tokens, and pulling or pushing all positions after such a stretch forward or backward a little when the size of the stretch changed. Mapping is now also cheaper (integer arithmetic vs allocating new objects) making having a lot of marked ranges in your document more viable.
-
Certain types of position-manipulating code was really hard to write, even for me, though I invented them. This was mostly seen in the code that implements transformation steps and commands. If you needed to go from the position of a join to the position inside the newly-joined node, for example, you’d have to write some taxing code. Now you can just subtract one from your position.
-
These reductions in complexity have allowed me to finally rewrite the
replace
transformation step, which I had found myself simply unable to properly do with the old abstractions. This rewrite fixes some issues and paves the way to nice clipboard-related APIs and a more expressive way to specify the allowed structure of documents (#220).
Disadvantages
-
Massive backwards incompatibility. This touches everything that does anything with a position other than simply passing it around. Making the change required me to rewrite about half the codebase (which did give me the opportunity to clean some things up).
-
Position values themselves now confer very little information – they are just integers, and in a non-trivial document it requires a lot of counting to figure out what they point to. Which brings me to…
Using linear positions
The idea behind the old path + offset representation for position is that the representation is close to the meaning, which is often helpful. We lost that, but a meaningful representation is only a method call away. You can do doc.resolve(pos)
to get a ResolvedPos
object, which bears some resemblance to the old Pos
class, but contains more information, in a more easily accessible way – it knows about all the nodes, not just the offsets, on the path from the document root to the position, and has a number of convenience methods to help with common tasks, such as finding positions related to the resolved position.
(Resolved positions are cached, so resolving the same one a lot of times is cheap, and you are encouraged to pass around raw integer positions, not resolved position objects.)
As I mentioned, you’ll need to port any code that treats positions as non-opaque. Good things to grep for are .offset
, .path
, .shorten
.
The document representation stayed largely the same, except that fragments (node content lists) now store their total size (under .size
) and nodes expose their ‘skip size’ (the size they take up in their parent node) as .nodeSize
. Descending the document tree to find a position is done by searching each parent for the child that contains the position, and then entering that. (This may be optimized for large nodes in the future.)
What else changed
The Node
and Fragment
APIs shrunk a bit. Node iterators were dropped, in favor of direct indexing. Some convenience methods that were no longer used by the core (inlineNodesBetween
, toArray
, splice
, replaceDeep
) were also dropped.
The meaning of Node.slice
was changed – this now returns a Slice
object, which is the thing you now pass to Transform.replace
, and the thing that is used to represent clipboard content. Nodes have a .replace
method that is used to replace a range of the node with a slice, and supersedes much of the existing node-updating interface.
I’ve updated the reference guide to reflect the new interface, and will be rewriting (and extending) the other documentation in the coming weeks.
And a warning: Since this change involved rewriting huge chunks of code, it’s likely that new bugs snuck in. The tests are running beautifully again, but it’s likely some code they don’t cover is still broken.
Enjoy. I know incompatible changes are awful, but I’m quite happy to have pulled this one off, and excited to be able to move forward with the stuff that was blocked by it. Reply with any questions or concerns you have.