Optimize prosemirror-view with intersection observer api?

t13m · March 12, 2021, 4:38am

Hi, I’m new to prosemirror, and have to deal with some big document (~5 megabytes of utf8 txt) with it. The structure of document is rather simple, just a list of n x 10000 paragraphs/lines. When it is loaded into an prosemirror editor, editing becomes slow.

I find that one hot spot is iterDeco(…) in prosemirror-view. In iterDeco() the code seems have to iterate over the whole list of contents to find matching ones. I was wondering is it possible to optimize this procedure with intersection observer api? so that we can limit the iteration range down to leaf nodes being shown in the viewport and selection.

Thanks for making this awesome editor!

marijn · March 17, 2021, 10:31am

Viewport-based drawing is out of scope for ProseMirror. It’s just too much extra complexity and failure modes. (It might be possible to rig something up with an external plugin, but it’s not going to be easy.)

That being said, the expected bottleneck for huge documents is the DOM. iterDeco being the slow part isn’t expected. Can you tell me a bit more about your document shape and the kind (and quantity) of decorations you have? Maybe even set up a simplified demo of the issue?

t13m · March 23, 2021, 7:52am

Hi marjin. The document I’m dealing with is an array of paragraphs. In one document there is about 20k~30k paragraphs and each paragraph contains about 100~200 words. No plugins or decorators are being used.

Though the DOM is slow, the overall latency seems acceptable, if the first several paragraphs are being editing, but for the last paragraphs of the whole document, the latency is very big.

So I tried to split the doc into section, each section with a fixed number of paragraphs. After this done, the overall latency become smaller, but a very rough profiling result shows that iterDeco becomes the hotspot.

t13m · March 24, 2021, 9:05am

I looked around the code of prosemirror-view. It seems that iterDeco will iterate over all nodes of my document ( a flat array of paragraph nodes) to locate the one should be updated or the position should be inserted at.

In order to alleviate this iteration, I was wondering that the is it possible to optimize by changing the data structure to some kind of balanced tree (like red-black tree, maybe)? or will it be useful to caching the editing locations?

Thank you

marijn · March 25, 2021, 1:02pm

I’ve made the interface to fragments so that it would be possible to use a tree data structure for large nodes, through right now they still all use arrays, since I haven’t ran into a situation where the array logic is the bottleneck.

This patch fixes a quadratic bit of complexity in node updating, which might be what you were running into (iterDeco wasn’t really the source, but rather updateNextNode, called via the closure passed to iterDeco). Huge flat documents still aren’t fast, but somewhat better, and the remaining slowness seems mostly on the side of the browser.

t13m · March 29, 2021, 6:40am

Thank you marjin