Mapping line and index to doc positions

jameswragg · October 2, 2019, 3:15pm

I have an API that returns annotations (spelling/grammar/feedback etc.) to decorate a document. My issue is, it gives back each annotation with a line and index (line-based) for the annotation positioning.

Any suggestions on how to map these to my ProseMirror doc?

Thanks in advance

marijn · October 2, 2019, 5:32pm

ProseMirror documents are not, as a rule, divided into lines, so I’m not sure how to interpret this question.

jameswragg · October 2, 2019, 7:19pm

I’m able to supply the API with a format it wants (array of text) with the following:

state.doc
  .textBetween(0, state.doc.content.size, "\n", "\n")
  .split("\n")

Its the mapping back of API-provided annotations based on these lines and indexes.

I’m hoping I’m lucky in that the schema is simple, in as much as only having to deal with headings, paragraphs, hard_breaks and lists (unordered/ordered) so the documents are fairly shallow trees to work with.

Any pointers appreciated.

marijn · October 3, 2019, 9:51am

Since textBetween loses information (non-text structure), I don’t think there’s a way to just map a text position back to a document position. If your schema is extremely simple, or you are willing to do complicated things like matching strings to determine context, you might be able to find a solution, but there’s no simple trick.

jameswragg · October 3, 2019, 12:20pm

Appreciate the quick responses, thank you.

With your comments in mind, I’ve ditched textBetween and taken to bend the rules by passing the API one line of text (trying to ignore lines exist) with string fragments whose index should match the doc positioning like so:

getContentAsOffsetText(state) {
  const { doc } = state.tr;

  const injectStr = (sourceStr, index, newStr) => {
    return (
      sourceStr.slice(0, index) +
      newStr +
      sourceStr.slice(index + newStr.length, sourceStr.length)
    );
  };

  let resultStr = new Array(doc.content.size).join(" ");

  doc.descendants((node, pos) => {
    if (node.type.name === "text") {
      resultStr = injectStr(
        resultStr,
        pos,
        doc.textBetween(pos, pos + node.nodeSize)
      );
    }
  });

  return [resultStr];
}

Results in something like this which gives me the right index for the typo ‘sstruck’:

[" The mouse ran up the clock.          The clock sstruck one…"]

It seems to be holding up, can you foresee any issues with this approach?

marijn · October 3, 2019, 7:26pm

Looks reasonable.