Queryable AND collaborative ProseMirror documents

I have a large corpus of documents with every word marked with grammatical features. I want to full text search and render fragments of these documents. I also want to synchronize these documents across devices.

Storing the documents as HTML or JSON seems impractical. Secondary indices must be synchronized to each document. A single change will require updating multiple data structures and reserializing entire documents is expensive.

Storing the documents as YJS or Loro is good for collaboration and persistence, but the secondary indices problem remains. Also those glue layers sure are… sticky.

Since I’m interested in querying by word, a SQL schema like this seems natural:

create table document (
	id string primary key,
	-- other metadata
);
create table word (
	docId string,
	id float8 unique, -- globally unique, useful for collaboration

	block string, -- serialized node
	before string, -- punctuation
	text string,
	after string, -- punctuation

	stem string, -- stem of `text` for search
	chapter usmallint, -- for rendering/searching only a single chapter
	sentence int, -- for rendering/searching only a single sentence
	-- other useful word features for grammarians and translators
);
create table span (
	docId string,
	startWordId float8,
	endWordId float8,
	tag string,
	data string -- could make union types multiple different ways in SQL
);

My questions:

  1. Has anyone had success querying ProseMirror compatible documents in SQL databases?
    1. If so, how did your schema differ from above?
    2. If not, why not?
    3. Did you bother translating ProseMirror transactions into SQL statements or just periodically snapshot documents?
  2. How have others dealt with collaboration while maintaining querying and fragmenting? I’m looking at Doltlite as a potential solution but find solutions like articulated fascinating at preserving semantic intent.