Storing metadata with every word

Hi all!

I’m new to ProseMirror and before spending time diving into the framework, I wanted to ask if you thoughr my use is feasible.

Basically I need to store transcription timing metadata against every word. I could have a mark with appropriate attributes and populate the initial content easily enough, but I also have the following high-level challenges:

  • If the user changes a node’s text to include multiple words, I need to split this into new nodes with marks.
  • If the user pastes in, for example, a sentence, I need to split this into a node per word with marks.

Is it best to approach this like the linter example, where something is watching changes to the entire document all the time, or is it best to have a plugin that’s looking at each transaction and adding marks as they happen?

There’s also the option of an input rule that’s watching for SPACE which could also work, although this doesn’t solve pasting of text.

If anyone has a take on where I should start looking it would be great appreciated!

If possible, I’d keep this data out of the document and in some plugin’s state (carefully updating it on every transaction). See for example the change tracking example. You may even, depending on whether you need the data occasionally or constantly, be able to just keep a history of transactions and derive what you need from that using something like prosemirror-changeset.

Thank you @marijn! I’ll look into your suggestions.