Do positions in JavaScript measure "text" nodeSize the same as String.prototype.length?

Basically just the title. I’d like to know if ProseMirror does anything special when measuring positions in a different way than the DOM / String class does.

In my exploration, I’m experimenting with managing the entire document model in another language, which has multiple strategies for measuring characters in strings (graphemes, bytes, etc.).

Thanks!

Yes, it does (text node lengths and offsets are measured in UTF16 code points).

1 Like

Do you by chance know if Rust’s encode_utf16() would produce the same lengths as JavaScript would produce?

I guess I’m curious if converting utf16 to utf8 to utf16 is lossless in that I will always get the same utf16 back.

I guess in Rust you can pretty much assume that utf16 arrays are valid utf16, but unfortunately, in JS, there’s no such guarantee (it is perfectly fine to construct strings that are invalid utf16). I’d say that for validly encoded strings, the length correspondence holds, but once a misencoded string is involved, it won’t. But I maybe Rust will blow up on such strings in general, and you’d catch them in some validation phase?

Gotcha! Thanks for sharing your insight