Is there a propper way to normalize pasted html when using docFormat: "text"

let pm = new ProseMirror({
  place: document.querySelector("#editor"),
  menuBar: true,
  doc: document.querySelector("#content").innerHTML,
  docFormat: "text",
  schema: dinoSchema,
  autoInput: true
})

Now I can use something like

function strip(html){
   var tmp = document.implementation.createHTMLDocument("New").body;
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}

pm.on('transformPastedHTML', function(a){
  var stripped = strip(a);
  return stripped;
})

Is there a better way? Maybe option or something like this?

thanks

What are you trying to do? Your code just seems to deserialize and then re-serialize the HTML string without doing anything to it.

I want to cleanup all html tags (all markup)

this function

function strip(html){
   var tmp = document.implementation.createHTMLDocument("New").body;
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}

just use temporary DOM node for this goal.

console.log(strip('<b>hello<b> <i>world</i>')); 

// >>> hello world

That seems like a rather bad idea – it’ll make it impossible for people to copy-paste any kind of markup from one place in your editor to another. But yeah, if that’s what you want to do, the code you show will do it.

I just want to disallow copy-paste html markup from any other place to prosemirror with docFormat: “text”

I mean anyone can copy markup from any website, paste it in prosemirror with docFormat: “text” and it will no longer be “text”

docFormat is not what you think it is. It only controls the way the initial value of the doc option is interpreted.

ProseMirror already forces pasted content to conform to its document schema, so the usual problems of pasting in arbitrary HTML do not exist.

Is there a way to strip all paragraphs, headings, bold, italic… etc when paste? I need to convert everything that is pasted to plain text

thanks

Does your document in general disallow these kinds of markup? If so, the proper way to address this is in the document schema, not in a paste filter. (If not, I still thing you’re doing something strange, but your code does seem to accomplish this goal.)

Yes, my document disallow these kinds of markup.

It allows only links and few special marks

Could you please give me more info (docs, examples, etc) of how can I solve this with document schema? I really love prose mirror and very appreciate your product, your time and your help

thanks in advance

You can create a schema that contains only paragraphs, and no marks, something like this:

const mySchema = new Schema(new SchemaSpec({
  doc: Doc,
  paragraph: Paragraph,
  text: Text
}))

(Where Schema, SchemaSpec, Doc, Paragraph, and Text are imported from ProseMirror’s model module.)

If you then use mySchema as the value of the schema option, you’ll get a wonderfully constrained editor.

1 Like

Hi

We have a requirement whereby we have to convert content of clipboard to plain text on paste. I have used the above code i.e.

  const strip = (html: string) => {
    const tmp = document.implementation.createHTMLDocument('New').body;
    tmp.innerHTML = html;
    return tmp.textContent || tmp.innerText || '';
  };
...
 transformPastedHTML: (html: string) => strip(html)
...

When I copy content from a wordprocessor, e.g. LibreOffice Writer, I get the following:

@page { size: 21cm 29.7cm; margin: 2cm } p { margin-bottom: 0.25cm; line-height: 115%; background: transparent } Hello Prosemirror

Note that the pasted content contains styling information as well as the plain text which is: ‘Hello Prosemirror’.

Can someone please let me know how to remove the styling information?

Thanks in advance

Julien.