Maintaining selection

johanneswilm · August 22, 2017, 8:27pm

Three questions:

I assume you created some node that represented a page, and you then split the various content nodes inside of it and put them together again if content on that page is being removed. Have you added something to make sure you don’t join paragraphs that shouldn’t be joined that just coincidentally end/start at the page boundary? (should be doable by assigning an ID to each paragraph)
I wonder if it would not be possible to use widget decorations to simulate page splits instead. The advantage would be that one wouldn’t touch the content. One could have one user looking at the editor in pagination mode on a laptop and another looking at it in continuous mode on a mobile phone simultaneously.
No matter whether using decorations or page nodes – under all circumstances it seems like one is forced to do at least two renderings to find the page splits – unless one is able to calculate the exact size of everything on beforehand. So I wonder: Do you end up letting ProseMirror render whatever change, and if a page overflows then you rerender and make it cause another transaction? Is that fast enough for the user not to notice?

marijn · August 23, 2017, 7:33am

Most of this, under handlePagingOnTransaction and createHistoryEntry look llike they are your own code, so I can’t explain them to you. One obvious problem in this trace is that you appear to be creating nested transactions, causing several DOM updates (i.e. not using appendTransaction, but just calling dispatch again while dispatching a transaction).

JCHollett · August 23, 2017, 12:43pm

We use IDs on each paragraph and page and thats how i decide what gets rejoined its a little janky though but it works. When joining page breaks I do so at the appropriate depth for example if I want to grab a node from the next page it looks like. [/p][/div][doc][div][p][/p][/div][doc][node came from this page]. Then I just rejoin the pages. so, I split after the node I want on the next page, which creates a new page for a moment, then I join the top of that temp page to the bottom of the page I want the node. I guess this explains it?
I briefly tried to do this, I would like to avoid that entirely blehhhhh. So many advantages using the model to represent pages directly, liked the resolvedpos can tell you SOOOOOO much and act like an iterator too AND keep track of what a page is. I made my own iterator for just taking the resolved position and using static calculations to iterate pages so it doesnt need to call resolve constantly. On the note of mobile compatibility, this is for a company, and they have separate versions for phone stuffz. At least I think, either way thats a non-issue for me.
Right now it’s definitely too slow and the user would notice for sure. The profile would confirm what you’re saying since the transaction for paging is created on the view update. Thats why I gave the profile. It’s got all that overhead. On my production test which I tried yesterday, it optimizes the crap out of it which is great, and i think it gets like 10-20fps? So it’s usable, but, if haverbeke says we’re doing stuff wrong, there’s performance gains for typing that are not pagination related. It’s not just my code I have to worry about, there’s other dev code here that is in bleeding edge state. Pagination is about the most developed feature. Not to mention collab editting. Also if you didn’t do it on the view, you’d have to prerender everything for sure. The pagination transaction I create is to confirm the view looks correct and make any changes that would make it look correct.

I realized that big long spiel is probably unnecessary

vbpavel · June 6, 2018, 8:17am

Hey JC,

we are also trying to do pagination on a ProseMirror editor, would you share how did you do it in your case? Using direct Page Nodes in the content, which are then saved with the rest of the content, or Decorations?

What would be the best, from your practical experience with paging the Prosemirror?

JCHollett · June 6, 2018, 11:52am

Depends on what kind of use case you’re trying to satisfy. We have ultra strict constraints on what a page is, absolutely enforced margins, different indentations and line lengths per paragraph type. Special rules for certain paragraphs between pages. What we have is a lazy pagination style algorithm that I made basically, what we do is count up the heights of all the elements on the page either using a prerenderer if the prosemirror node changed, or query from the dom directly in the case that it is safe to do so, check and see if that fits on a page, if it doesn’t try to rejoin it with its widowed content, otherwise leave it orphaned, or join it back and calculate the line break which can be pretty complicated depending on what you have in your node content, since you can’t trust textContent.length alone at times. The most important part being how you slice and join nodes using some sort of logic that works well with prosemirror’s slicing on enter operation. We have unique ids on all of our nodes too.

I could rant and ramble for 2 days over the stuff that goes into pagination. Let me just say, it took 8 months of 3 different finished versions of our pagination to end up with something that is reliable enough to leave it alone in our production environment. There are still some things we need to do though.

I suppose the real answer you’re looking for is that we have pages in our schema, mostly because we want it to be part of the transaction model for collab. Decorations aren’t reliable enough require maintaining and I could never get them to work right. The best solution for us was to just figure out what should be on a page and never fire off anything to prosemirror that wasn’t necessary. But, no matter what if you have a lot of pages you’re basically doomed to work around the browser inefficiencies. Since having a long element in html requires massive re-rendering time you can’t hope to be able to fully paginate a doc without ruining user experience. So, we have a lazy onscroll paginate and a full pagination on certain actions. That said our full pagination times are pretty good at around 150ms for a 125pages.

vbpavel · November 27, 2018, 3:25pm

Thank you, JC! Sorry for my very late response, but we re-ignited development of the final editor now (was stopped for some time). We’ll take into account the advice to use PageBreak Nodes, instead of Decorations.

vbpavel · December 4, 2018, 11:54am

Hi JC,

we are now starting the paging implementation of our editor, and since I saw it took you 8 months to get it right, would you share some advice of how to approach it as from start.

i.e.

We calculated that the pages should be a PageBreak node directly in the PM schema. (so not a decoration)
The calculation of where the PageBreak should appear would be:

we take all the nodes, all the styles of each node (Padding/Font/etc…), also we take the Template (PageSize, etc) => by combining all those, we can calculate where the PageBreak node should be placed. Then we do the same calculation for the following Page and put a PageBreak, etc…

We have unique IDs on all our nodes.

Do you adhere to that way of calculating the pages, or you used another approach? Otherwise would it be possible that you share some sample events/parts of code => i.e. the pre-renderer, query from DOM, The Lazy OnScroll pagination… Anything you feel might really open our eyes in how to do it correct…?

Big thanks in advance for any help…

JCHollett · December 4, 2018, 7:13pm

Dont using padding, use monospace. Make a function to determine what position nodes need to be sliced. Do try to query elements from the DOM for height unless they’ve changed which you need something to calculate directly the height. Rendering to check for height may be faster than trying to check with code.

JCHollett · December 6, 2018, 2:32pm

I think the worst parts of it is that a lot of the keybinds have to be changed to account for things. It also depends on what kind of structure you have, our elements need to be rejoined to their type they were also split from and not ones they were not split from when joining pages. So that will also change how complex your stuff is. We were working on a two column paginated editor. That was much worse that the single column page. But it was canceled due to wanting other features instead.

JCHollett · December 6, 2018, 2:53pm

I definitely would not be able to describe all the things we do without writing 5 walls of text. Theres over 4000 lines of code for that stuff. Even more if you consider all the logic for rendering and CSS, keybinding accommodations.

Here’s a list of tips:

Do not use padding, use margins between elements (browser scaling nightmares and element.offsetHeight / CONST_LINE_HEIGHT)
Try to define constants for everything, try to not use getBoundingClientRect(), try to use only offsetHeight and not offsetTop. Due to the iterative nature of pagination this will cost you performance and the cost will scale with content.
We use a dom structure that is nearby that share all the same CSS but is hidden where elements are appended so we can check heights of elements that have changed and cannot be queried accurately from the current element in the DOM or it doesnt exist. Generally speaking I use node.eq(dom.pmViewDesc.node) to check this equality.
Do calculations across your elements that are paginated, not the pages in general. Trying to calculate entire pages may be more difficult. You have to check if all your elements on a page are properly re-assembled if you do it that way, I have found that to be more difficult than worthwhile.
Try to make your styles/heights/font all as consistent and well defined as possible, so that everything is predictable. Height/spacings where its Multiples of some number makes it as easier. Make your page a multiple of some amount of lines you want to meet. Like your page is 50 lines, so you want your elements and space between elements multiples of some height, like 20. so your pages are 1000pixels.

vbpavel · December 8, 2018, 12:45pm

Hi JC, thank you for the nice advices. We will check and follow each and every one of them in detail! (Will ask you any further help questions, should we meet issues during development - and yes we will

JCHollett · December 9, 2018, 5:18pm

What i’ve found is the more predictable the CSS is with relation to your constants the easier it is to define the functionality.

There are many aspects of this that are also useful that prosemirror and contenteditable do on its own. I think perhaps one of the biggest pieces of advice I can give is that we put IDs on everything and we split from the main element and we say that the main element’s id is passed on each element and compared across each element, so if anything is split or anything happens through main functionality of prosemirror, that element’s originating ID is copied to each sliced node, therefore you are always able to know where this element came from, the only other factor is making sure the sliced fragments of the main node all have different IDs, while maintaining the originating id in a different attr. Then if you have to recollect the node, you always know how to collect its content fragments.

tstoev · March 4, 2020, 4:28pm

Hello, ive been working on a pages for a few weeks now following your steps. I made a page nodes and I hooked up on plugins appendTransaction where i enhance the Transaction with join/split. But spliting the content on the page boundary is inserting an empty paragraph thus changing the document. Can u guys guide me a little bit here please, whats the best way to get rid if this empty paragraph ?