Do positions in JavaScript measure "text" nodeSize the same as String.prototype.length?

colel · May 11, 2020, 4:33pm

Basically just the title. I’d like to know if ProseMirror does anything special when measuring positions in a different way than the DOM / String class does.

In my exploration, I’m experimenting with managing the entire document model in another language, which has multiple strategies for measuring characters in strings (graphemes, bytes, etc.).

Thanks!

marijn · May 11, 2020, 4:44pm

Yes, it does (text node lengths and offsets are measured in UTF16 code points).

colel · May 11, 2020, 4:58pm

Do you by chance know if Rust’s encode_utf16() would produce the same lengths as JavaScript would produce?

I guess I’m curious if converting utf16 to utf8 to utf16 is lossless in that I will always get the same utf16 back.

marijn · May 11, 2020, 6:02pm

I guess in Rust you can pretty much assume that utf16 arrays are valid utf16, but unfortunately, in JS, there’s no such guarantee (it is perfectly fine to construct strings that are invalid utf16). I’d say that for validly encoded strings, the length correspondence holds, but once a misencoded string is involved, it won’t. But I maybe Rust will blow up on such strings in general, and you’d catch them in some validation phase?

colel · May 11, 2020, 6:04pm

Gotcha! Thanks for sharing your insight