Positions as numbers is too error-prone

There should be another way of passing positions than numbers within transactions.

I can’t count how many times i have caused a bug when a change in tr.doc has caused the position to shift, thus the next use of the position throws (or causes a hard-to-find bug).

The fact you have to remember to always pass the position to tr.mapping.map makes me think you should be able to use auto-mapped position by default, removing a major source of bugs. Fiddling with raw numbers as positions is kinda bad practise in the first place and many times it would be nicer to just call tr.deleteNode(node) but that’s probably too difficult to implement.

1 Like

Nodes, in ProseMirror, don’t have an identity. You can have the same node object appear in your document tree multiple times, and two documents that have the same structure, even if the node objects are newly allocated, count as the same document. As such, using node objects to specify positions simply does not work.

I know but that is the optimal API in majority of cases as it removes one degree of freedom that has rather painful footguns like forgetting to map the position.

Technically, within doc.descendants or nodesBetween, you do have an absolute reference to the Node within the callback with the current parameters that it was called with. As you can map the passed position, after mapping, to the node. Unless you have succeeded in mutating the doc so that it was deleted (dunno would that break though, bad practise nonetheless).

Anyway, my main point is that numbers for positions is just calling for very nasty bugs as forgetting to use map is pretty easy. API-wise I don’t really know how it could be implemented. The root cause probably is just the fact that transactions mutate the doc and it provides APIs which imply atomicity and some that actually aren’t, such as tr.doc.descendants, so the parameters (here pos) can be become invalid during its subsequent callbacks (which you have to manually map yourself). Maybe there should be transaction method for nodesBetween that automaps the positions.

I put some more thoughts about this in a blog post.

3 Likes

Oh, nice!

I do understand your approach and there’s perf-hit definitely involved in keeping a pos-to-id mapping alive at all times. But does such system need global ids? Could they be more akin to pointers, user-allocated, that appear once you start to track them?

Actually this reminds me that I did wrangle together a poor-man’s version of this on my own, TrackNodes plugin I think I called it. It’s pretty hacky code and I’m not sure you are interested in seeing it but basically it’s a plugin where I register nodes that I want to track and then it automaps them over transactions.

I use it to display node-based data in the UI without having to generate attribute-based global IDs for everything that might duplicate when people copy-paste stuff and edit collaboratively. I spent some time on making sure I used the diff of the changed range to run the updates only for nodes within it, so that the nodes above were skipped and nodes below only had their position offset adjusted.

But for better or worse many editors already implement their own ID plugins which add global IDs to nodes. Otherwise, you can’t set hash anchors to headings for example and mapping DB objects to nodes is impossible. What’s even more incredible, is that Yjs already has unique IDs for each node that are already global and tombstoned so it’s only if somebody had the time to make an adapter we could use them. Then of course, you’d still need the position mapper but that would a lot easier to make.

TrackedNodes<string, { node: PMNode, pos: number }> type of container where each time you add a node, it starts tracking it. API for transactions would use something like tr.getNodePos(id) to get the mapped position. This would map a single node and the whole remap would be run only once after all other transactions. In the UI you could use plugin state with trackNodesKey.getState to render node-based widgets.

This would also make tracked changes functionality a lot easier to implement. In my opinion a lot worse performance-wise is having to render always the full doc were it 1 page or 100 pages long. If that could be avoided a lot of perf problems like these wouldn’t even matter. The approach Google Docs took to re-engineer their editor.

It is funny that you two are talking about this because I’m actually currently working on my solution for this in BlockNote right now: BlockNote/packages/core/src/locations/README.md at ea043cd944e23cd3b52166c039eafedef1af2535 · TypeCellOS/BlockNote · GitHub

I decided to take an approach of using the both Unique IDs & offsets, in BlockNote we already assign IDs to each “block” (a container for arbitrary node types), so we already had some existing stuff for things like findByID within this document. The more novel approach though is taking some inspiration from Slate’s Location API & adding that to ProseMirror as Point & Range where a Point is a pair of id & offset and a Range is a pair of Points. Using this, it should allow great flexibility in ProseMirror operations, because if all you are trying to do are node operations, you only need the node’s ID, and if you want to do something more specific then you are dealing with offsets relative to that node’s ID. I feel like it is a good hybrid approach that should make the, arguably, more common case of dealing with node positions relatively pain free. I will have to see how it plays out performance-wise, but in most cases, of our users, documents are small, so a scan should be pretty cheap.

@TeemuKoivisto you mentioned that you wanted an API for tracking positions, I also built this for BlockNote. I built it in a way that if you are using ProseMirror without Y.js it accumulates a Mapping for the document’s editing history. I don’t persist this mapping since I only trust this API for view behaviors (e.g. where to place a menu). Whereas, if the editor is using Y.js it switches to using relative positions in Y.js which use the Y.ID which is stable & tombstoned by Y.js. If interested, you can see it here (it ends up being a very easy implementation!): BlockNote/packages/core/src/api/positionMapping.ts at 90665db51eea65ea4581f79b2bcbd7b8e605bb97 · TypeCellOS/BlockNote · GitHub

I can understand why @marijn would not want to go this route, since ProseMirror is a much more generic implementation that doing this in the general case is a bit of a nightmare. But, in our document structure, which is a lot more structured & defined, we can exploit it a bit to know certain things that ProseMirror wouldn’t be able to.

But won’t such a point still become invalid after a change that affects the content of the referenced node, since elements may be inserted or deleted before its offset?

Yes, the Point would be invalid after that operation. But, what I’m trying to do here is make it more understandable & compatible with ProseMirror’s existing position system (I can resolve a Location to a position, and a position to a Location).

The alternative to this would be unique IDs as you mentioned in your blog post, but for my purposes, this should be good enough. Maybe fractional indexing would be interesting for this (this was just the first implementation I saw, there are definitely others).

How would that make track changes functionality easier?

I don’t think node id is always needed or it may have very different functions depending on a project. PM doc can be easily stored in JSONB as a whole and in such storage implementations ids are not that crucial. You do still need IDs for cross referencing inside the document and things like footnotes and building interfaces based on data from nodes and other such things but it’s a very different role from mapping to outer entities, which, I suspect, is needed in yjs. I guess, what I’m saying, this should be implemented as an abstraction layer per project or as a different library as the needs may differ significantly.

Have you tried making track changes? The biggest problem with it is figuring out what has changed and how it can be wrapped in some user-friendly UI. With ids it’s trivial to see whether a particular node was changed or not. Reverting/combining hierarchical operations would still be troublesome but for that I don’t think there’s an easy solution for.

I see, thanks Teemu! I’m working on manuscripts tc-plugin, that you started. And it doesn’t really look ids is gonna be a lot of help to us there.

Oh. Neat. Well I did build the algorithm already to work without having stable ids but it would have been easier had I had those. The position tracking was insanity.