Heads-up: Document node API overhaul

marijn · December 9, 2015, 10:15am

The bunch of patches I pushed yesterday make some big changes to the API exposed by document nodes, and the structure of the document data structure. They were driven by three considerations:

Allow nesting of content under inline nodes (see also this discussion). I am not yet sure how we are going to use this, and I don’t expect all code to handle it yet, but the restriction that disallowed this was sort of arbitrary, so I figured it would be a good idea to remove it before publishing a stable API.
Wrap the representation of node content in a proper abstraction, rather than exposing it as an array, so that we have the freedom to, for example, represent big nodes using a tree structure in the future, to reduce the cost of modifying/copying big flat nodes (such as a document with a couple thousand paragraphs at its top level).
Reduce the differences in the API of block, textblock, and inline nodes. The way textblock nodes could be indexed both by character offset and by node offset was confusing and produced clunky code.

The gist of the change is that everything is indexed by document offsets yet, which means that when dealing with textblock nodes, the code only talks about character offsets. Text nodes still take up more than one offset unit, which is admittedly awkward. I experimented with APIs that hid this fact and exposed text nodes only as single-character chunks, but this was inefficient and also a pain to work with.

The random-access child(n) method on nodes now interprets n as a proper offset even in text nodes. You should use this only for tasks that are really random-access, such as descending the tree based on a path. For iteration, there is a simple forEach internal iterator, and nodes support iter(from, to) and reverseIter(from, to) methods that produce ES6-style external iterators. They will yield all the nodes in the range (you can leave off the range args to iterate over the whole node), and if the range boundaries fall inside text nodes, only the part of those text nodes inside of the range is yielded.

A node’s size is no longer accessed with length and maxOffset, but with a size getter. I found that Chrome’s devtools tried to display objects with a length getter as arrays, which was not appropriate for our document modes. Each node also has a width accessor which returns 1 for normal nodes, and the length of the text string for text nodes.

The content of a node is now wrapped in an abstraction, called a Fragment. There are different implementations for fragments that contain text and fragments that don’t, so that most nodes can use direct indexing, and only inline nodes with text content have to deal with non-single-width nodes.

Rewriting all that code to a new API was a bit of a pain, and I hope you people didn’t have too much code written that touches these APIs, but I think the result is cleaner, and will cause less issues down the road.