Sync senteces wirh audio played

Hello! This is my first time posting here, so I’d like to express my gratitude to Marijn for the excellent work!

In my editor, I plan to include the text of a book synchronized with the playing audio. I’m seeking a method to mark each paragraph with {start, stop} positions on the audio track (outside of the editor). This will enable me to highlight the text segment corresponding to the current track position during audio playback.

Therefore, I have the following requirements:

  • I need a way to mark text pieces in the editor. I’m considering using Marks for that purpose. Since I’m using Tiptap on top of Prosemirror, I tried this extension as example, and it works fine and seems adaptable for my case. However, there may be gaps (non-marked text) between marked text, which seems acceptable for my logic.
  • I need a way to find and scroll to the marked position. For example, if I have a track position of 85 seconds, I should be able to search for a marked node with this position in its interval (for example, {start: 82.4, stop: 96.1}).

Now, my questions are:

  • Is using marks the only and best way to achieve this?
  • Is there a way to quickly find these marks? I assume I might need to preprocess the document into some data structure to obtain the necessary nodes faster?

Sorry, my questions may seem obvious, but it’s mostly for my ideas validation (maybe there are better ways). Please let me know if you have any suggestions or ideas to enhance this approach.

p.s. I’m using Tiptap

There are trade offs between the mark approach and decorations. Decoration approach would be more “normalized” and no alterations (marks) to document required - they use a position / offset start and end (in one usage). You should research the differences / what your software would benefit from most (what would require the least code, and the least hurdles for users / admins of these transcripts / audio type documents)

There are ways to “quickly find” either. There are similarities between both implementations but as always, when you start off with a data structure convenient for your purposes, you will save yourself time.

For example, when using decorations, you’ll have a map or list outside the document that serves as the ultimate truth of start / stop positions linked to audio. If you go with mark, you’ll have to produce that by traversing through a potentially long document (which can be cached if document is unchanged)

The comment example you linked goes with marks, but if you search this forum for comments marks versus decorations you should find some interesting conversations on the different approaches.

1 Like