Why emoijs are not treated as one index

I have been working on grammar checker in Prosemirror. The grammar checker backend is written in Golang and Python, with Golang part being responsible for slicing the changed parts of the document and Python being responsible for detecting highlights (suggestions). However, we discovered that Prosemirror treats emojis as two characters (two indices), while Python treats them as one. As a result, highlights get misaligned. Is there any hack around this?

ProseMirror uses JavaScript’s string lengths, for efficiency and straightforward integration with the host platform. JavaScript strings are UTF16 based, meaning they use two string indices for ‘astral’ characters. This is a terrible system, but it’s what we’re stuck with in JavaScript land. You could write code that maps between this and per-character indices by iterating over the strings and adjusting for each astral character you find.

1 Like

Hi, Marijn. I got it, I will share the hack here as soon as possible :upside_down_face: . Thank you for quick response!

1 Like