I just spent the better part of a morning trying to figure out if there’s a library out there that works in the browser and can take whatever crap HTML Microsoft Word outputs and convert it into something that has some semantic semblance.
Like, the way Word outputs list is with paragraphs that contain a bunch of inline styles, the bullet itself included (wrapped in some conditional comments). Then, to detect ordered lists you have to actually check the “bullet” itself. The better way is to actually parse some inline styles of the document and see if the list has something like
This madness probably extends to other areas. Various turn-key editors (e.g. CKEditor) have some functionality to handle all this for you, but they’re very editor-specific. It would be nice if there was a library that could do this using only a plain DOMParser.
I’d think that this is something that ProseMirror users would have to deal with all the time, is there any established solution which I’m not seeing?