Storing metadata with every word

If possible, I’d keep this data out of the document and in some plugin’s state (carefully updating it on every transaction). See for example the change tracking example. You may even, depending on whether you need the data occasionally or constantly, be able to just keep a history of transactions and derive what you need from that using something like prosemirror-changeset.