How to calculate lines of text in a node?

ungoldman · July 3, 2019, 7:03pm

Hello ProseMirror community!

I’m attempting to find a reliable method to count lines of text in a block node within the context of a plugin.

Assuming the block node has a constant font size and line height, the most straightforward way I can think of is to get the serialized DOM node’s dimensions and divide its height by the line-height value.

I’ve read through the reference manual a fair number of times, in particular model.Node, and I haven’t been to able to find a method to access a node’s corresponding DOM representation when traversing a given document’s node tree, for example in this context:

new Plugin({
  appendTransaction (transactions, oldState, newState) {
    editorState.doc.forEach(function (block, offset, index) {
      console.log('block node', block)
    })
  }
})

I’ve seen a few mentions of people using getBoundingClientRect to get dimensions of a DOM node, but I’m not sure as to how to get to the point where I’d have a DOM node to call that method on when starting with a ProseMirror doc node from an editorState instance.

I’m not asking for anyone to solve this for me, just hoping someone has tackled this before and can point me in the right direction. Please let me know if you have any hints for me. Thank you!

ungoldman · July 3, 2019, 7:55pm

I think I’ve gotten a little closer – it looks like I can get the DOM node using the EditorView.nodeDOM method, and I can get position of a node using Node.descendants.

However, I don’t see any easy way to access the EditorView instance in the context of PluginSpec.appendTransaction. For context, I need to first analyze line count of nodes, and then based on information from that analysis, append a transaction if needed, so appendTransaction seems like the right place to be working here.

It seems only the PluginSpec.view prop can interact with the EditorView. I could try creating some kind of structure where the view plugin method keeps a copy of the EditorView for appendTransaction, but that seems convoluted and potentially problematic.

marijn · July 4, 2019, 6:32am

At this point, the editor view hasn’t been updated yet—the system first computes a new state (which may involve appending transactions), and then updates the view to the new state.

In general, the amount of wrapped lines in a node is potentially dependent on available fonts, window size, zoom level, and css changes, so it’s not something for which you’d be able to compute a fixed, reliable value. Stuff that depends on view layout is not typically stored in EditorState, since that’s considered to be a representation of view-independent state. But of course you can blur these lines a bit, if practical.

What are you planning to use this line count for, though?

ungoldman · July 17, 2019, 7:53pm

At this point, the editor view hasn’t been updated yet—the system first computes a new state (which may involve appending transactions), and then updates the view to the new state.

Thank you for explaining, that makes perfect sense.

In general, the amount of wrapped lines in a node is potentially dependent on available fonts, window size, zoom level, and css changes, so it’s not something for which you’d be able to compute a fixed, reliable value. Stuff that depends on view layout is not typically stored in EditorState , since that’s considered to be a representation of view-independent state. But of course you can blur these lines a bit, if practical.

In the case of what I’m building, we’re using

a monospace font
a content area with a fixed width
blocks with predictable fixed widths based on character width (ch)

This means lines in a node should be computable to a fixed reliable value, as long as we’re doing our due diligence to style and render our content according to our own spec.

Something to this effect is working for me so far:

doc.forEach(blockNode => {
  const type = blockNode.type.name
  let chars = 0
  let breaks = 0

  blockNode.forEach(inlineNode => {
    if (inlineNode.text) chars += inlineNode.text.length
    else if (inlineNode.type.name === 'hard_break') breaks++
    else console.warn('unexpected type', inlineNode)
  })

  // an empty line still counts as a line
  if (chars === 0) chars++

  // charsPerLine is a function that takes a type and returns
  // max characters per line -- should match CSS rules for said type
  const lines = breaks + Math.ceil(chars / charsPerLine(type))

  // do something with lines here
})

That brings us to the bigger question…

What are you planning to use this line count for, though?

It’s… pagination!

By pagination I mean more specifically dynamically adding, updating, and removing page boundaries, whether by (a) managing sibling page break nodes at the same depth as blocks, or (b) introducing a parent page node into the schema, then joining and splitting page boundaries. Either way pagination has to be done according to analysis of content length by various means.

I’ve read all threads I could find on the subject

I’m aware of ProseMirror’s stated goal of being a semantic editor (though I’m not entirely sure why the concept of pagination is at odds with that, but that’s an entirely different conversation). I understand paginating user content is out of scope for the project itself and not a supported feature.

However, ProseMirror remains the best choice for us as a basis for building our editor, and from various discussions on this forum it’s clear that it’s something many have chosen to do, with varying degrees of success, despite helpful warnings from PM’s benevolent caretaker.

I’m currently leading development for a project (specifically, https://showrunner.io) whose goal is to create a full-featured editor for screenwriting in a browser context (much like google docs, but with very different document constraints and industry expectations). Pages have a great deal of meaning in the context of screenwriting, not just for visually representing what will be printed, but as a reference tool during filming, in particular in regards to the concept of locked pages. From reading threads in this forum that also talk about needing to paginate content with a monospace font and predictable content dimensions, I suspect we’re not the first to try using ProseMirror for this specific type of application.

I’m including this information just as background – I don’t expect ProseMirror to solve all our problems, I am grateful for what it is accomplishing for us, and I also hope to contribute something back to the project or at least userland in the process of building our product. Collaborative rich text editing is a hard problem to solve, and ProseMirror has done a better job at providing a toolkit for solving that problem than any other library we’ve encountered so far. In the context of many collaborative rich text editors (i.e. word processors with an expected print output), pagination is often a hard requirement for users. I can say it definitely is for our users.

Within the scope of pagination specifically in the case of screenplays, in which the font is monospace and there are quite a lot of well-established rules and expectations around content width and height, counting lines is a pretty reliable method for determining height. We’ve settled on 56 lines per page, though there is a great deal of variation (see this article if you’re curious).

So, going back to the context of this thread, the method I’ve described in the code block above for calculating lines of text in a node (iterating through blocks, then calculating lines based on predetermined block node character width properties, length of text, and line breaks) is working for us so far. I’m still in the process of solving the question of how to dynamically paginate content while respecting manually inserted page breaks, but I’ll leave that discussion for another time.

marijn · July 18, 2019, 8:19pm

Have you tried running your own line-breaking algorithm, independent of the browser’s rendering, to determine line count? There’s some awkward corner cases (such as which characters allow wrapping between them, exactly), but it can be done.

Alternatively, put the text in a properly-styled invisible scratch element outside the editor and measure that.

samuelgoldenbaum · May 23, 2020, 3:44pm

Did you ever managing to solve your pagination requirement with PM?

ungoldman · June 11, 2020, 5:43am

@samuelgoldenbaum we have a pagination plugin that is working well enough for basic needs but it’s doing pretty poorly in terms of performance for any document above a certain threshold of pages, and the method I described above (counting characters using a monospace font) has lots of problems.

The three main issues with the method I described above are:

it doesn’t account for soft wrapping of words done automatically by the browser, so it’s often wrong
any time we change the schema we have to update the plugin’s line counting logic
we’re locked to a single font family and size

So the short answer is… sort of? I wouldn’t recommend doing what we did. I’m trying to find time to rewrite the plugin to measure content height in each page and recalculate pages based on that, but it’s not been an easy or obvious thing to figure out, and even if we get that method working well enough performance of the editor (time to response for typing) will be significantly impacted by all the DOM node generation, measuring, transactions with large step counts dispatched, and garbage collection.

I am hopeful that we can figure it out but we haven’t so far. The browser’s constraints itself and ProseMirror’s architecture are the two biggest obstacles. ProseMirror is just not designed to support dynamic pagination, and the browser is a challenging environment in which to build a full-fledged paginated collaborative word processor. We don’t have google’s budget or staff, so we’ll just keep trying and hope for the best

samuelgoldenbaum · June 12, 2020, 1:28am

Thanks for the elaborate reply. The issues you raise are exactly what I worry about. It seems a layout engine that includes virtualization is needed for our needs. So we can calculate page rendering and also only display the relevant nodes - really no point in creating nodes for chapter 30 if you viewing chapter 1.

Need to have a think about this.