I have made an editor for a Korean website which worked great until some of the creators complained about some letters getting broken (looks something like this - Á¿î
).
I suspected this problem to be the result of not normalizing the saved content in one distinct format, NFC or NFD either.
Currently the content saved to the DB is going through the following process.
const serializer = DOMSerializer.fromSchema(mySchema)
const getHTMLStringFromContent: (content) => {
if (content.doc == null) return null
const wrap = document.createElement('div')
const contentFragment = serializer.serializeFragment(content.doc)
wrap.append(contentFragment)
const html = wrap.innerHTML
wrap.remove()
return html
}
const formData = {}
formData.Content = this.getHTMLStringFromContent(this.contentJson) || ''
...add other meta data...
// and upload it.
My hypothesis is that, if I normalize every content in NFC right before uploading, there won’t be any unicode breaks.
I want to ask the community whether my hypothesis is correct, and if so, what is the suggested practice to normalize the content?
Would below code work?
const getHTMLStringFromContent: (content) => {
if (content.doc == null) return null
const wrap = document.createElement('div')
const contentFragment = serializer.serializeFragment(content.doc)
wrap.append(contentFragment)
const html = wrap.innerHTML
wrap.remove()
return html.normalize('NFC') -----> normalizing it here...
}