Content hyphens editor view versus content model

pojo · November 24, 2021, 11:37pm

I am experimenting with hyphens (e.g. non breakable space, soft hyphens, …)

My ultimate goal is for the ‘content model’ (i.e. the serialized representation of the PM document) to carry any of the information related to hyphens in it - as that representation is something I would like to work with on the back-end. However, I’m seeing some behavior that I simply do not understand and would like to get some information about.

non breakable space In my experiment I have a button that allows me to enter a non breakable space. I use tr.insertText('\u00a0') which in my PM view does result in the expected behavior - an entered whitespace. However the ‘content model’ (serialized DOM) changes this to a named entity  . Is there a way to have it emit the hex entity instead of the named entity? Reason I ask is because that would make treating the ‘content model’ as the base for XML easier.

soft-hyphen Similarly I have an experiment with the Soft Hyphen (SHY, see below demo). Along with a button to enter a SHY, I’ve also added a button to ‘reveal/toggle information about that’ (decoration). Somehow PM does have the information of where the SHYs are entered (as they can be shown). However, the information about these hyphens do not seem to be present in the ‘content model’ (serialized DOM) (or maybe they are, but I need to read/process that information differently. Could someone explain this to me?

hyphens

marijn · November 25, 2021, 7:01am

The DOM created by DOMSerializer (which I assume is what you’re talking about) is a DOM object tree, and any encoding of that as a string is done outside of ProseMirror. I guess you’re using innerHTML? That’s the browser’s algorithm, and you could replace it with your own recursive function to convert to a string in a different way.

pojo · November 25, 2021, 10:16am

Thank you for your feedback. I’m trying to (first) establish how PM stores things like a soft-hyphen (non-printable) in its state - but haven’t found that yet. Probably the content is being interpreted by one of the APi calls or standard browser behavior. Okay this sounds sketchy, so let me try to rephrase

Is there a way for me to see ‘inside’ of what is actually stored (‘raw’) on a piece of selected text?

I’ve tried something like this:

  const textSelection = editor.state.selection as TextSelection;
  console.log(textSelection.content().content.toString());

but this doesn’t present the soft-hyphen in the selected text.

pojo · November 25, 2021, 10:24am

Doing something like this, as I did in the decoration to display soft-hyphens does print true where a soft-hyphen is present in the text (obviously):

  const textSelection = editor.state.selection as TextSelection;
  console.log(/\u00ad/.test(textSelection.content().content.toString()));
  console.log(textSelection.content().content.toString());

marijn · November 25, 2021, 10:58am

The ProseMirror document representation just stores text content as strings, so though console.log might not display them, they should be in there.