How to turn HTML to Prosemirror json?

I saw other posts that had this:

var custom_schema = _EditorSchema.default;
var html = '<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">  <head> <meta http-equiv=Content-Type content="text/html; charset=utf-8"> <meta name=ProgId content=Word.Document> <meta name=Generator content="Microsoft Word 15"> <meta name=Originator content="Microsoft Word 15"> <link rel=File-List href="Readme.fld/filelist.xml"> <xml>  <o:DocumentProperties>   <o:Author>Tharyan, Rajesh</o:Author>   <o:LastAuthor>Ted Chou</o:LastAuthor>   <o:Revision>2</o:Revision>   <o:TotalTime>2</o:TotalTime>   <o:Created>2020-10-08T15:46:00Z</o:Created>   <o:LastSaved>2020-10-08T15:46:00Z</o:LastSaved>   <o:Pages>1</o:Pages>   <o:Words>608</o:Words>   <o:Characters>3471</o:Characters>   <o:Company>University of Exeter</o:Company>   <o:Lines>28</o:Lines>   <o:Paragraphs>8</o:Paragraphs>   <o:CharactersWithSpaces>4071</o:CharactersWithSpaces>   <o:Version>16.00</o:Version>  </o:DocumentProperties>  <o:OfficeDocumentSettings>   <o:AllowPNG/>  </o:OfficeDocumentSettings> </xml><![endif]--> <link rel=themeData href="Readme.fld/themedata.thmx"> <link rel=colorSchemeMapping href="Readme.fld/colorschememapping.xml"> <xml>  <w:WordDocument>   <w:SpellingState>Clean</w:SpellingState>   <w:GrammarState>Clean</w:GrammarState>   <w:TrackMoves>false</w:TrackMoves>   <w:TrackFormatting/>   <w:PunctuationKerning/>   <w:ValidateAgainstSchemas/>   <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>   <w:IgnoreMixedContent>false</w:IgnoreMixedContent>   <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>   <w:DoNotPromoteQF/>   <w:LidThemeOther>EN-GB</w:LidThemeOther>   .....
the<span   style="mso-spacerun:yes">  </span>smbhml.txt</span> file - gp11 is SL, gp12   is SM,gp13 is SH,gp21 <span class=SpellE>i</span> s BL,gp22 is BM,gp23 is BH   portfolios<o:p></o:p></span></p>   </td>  </tr> </table>  <p class=MsoNormal><span lang=EN-GB><o:p>&nbsp;</o:p></span></p>  </div>  </body>  </html> ';





const doc = _prosemirrorModel.DOMParser.fromSchema(custom_schema).parse(html, {
  preserveWhitespace: true
});

console.log(doc);

but seems like is not correct.

I cannot find my json in the response here:

Node {
  type: <ref *1> NodeType {
    name: 'doc',
    schema: Schema {
      spec: [Object],
      nodes: [Object: null prototype],
      marks: [Object: null prototype],
      nodeFromJSON: [Function: bound nodeFromJSON],
      markFromJSON: [Function: bound markFromJSON],
      topNodeType: [Circular *1],
      cached: [Object: null prototype]
    },
    spec: { attrs: [Object], content: 'block+' },
    groups: [],
    attrs: [Object: null prototype] {
      layout: [Attribute],
      padding: [Attribute],
      width: [Attribute]
    },
    defaultAttrs: [Object: null prototype] {
      layout: null,
      padding: null,
      width: null
    },
    contentMatch: ContentMatch { validEnd: false, next: [Array], wrapCache: [] },
    markSet: [],
    inlineContent: false,
    isBlock: true,
    isText: false
  },
  attrs: [Object: null prototype] { layout: null, padding: null, width: null },
  content: Fragment { content: [ [Node] ], size: 2 },
  marks: []
}

I wish to run on the server side, hopefully document will not be needed.

Thank you. Ted

Node objects have a toJSON method that’ll return plain JSON.

1 Like

Thank you! I tried:


const doc = _prosemirrorModel.DOMParser.fromSchema(custom_schema).parse(html, {
  preserveWhitespace: true
});

const a = doc.toJSON();
console.log(JSON.stringify(a));

seems like is empty:

{"type":"doc","attrs":{"layout":null,"padding":null,"width":null},"content":[{"type":"paragraph","attrs":{"align":null,"color":null,"id":null,"indent":null,"lineSpacing":null,"paddingBottom":null,"paddingTop":null,"objectId":null}}]}

There are no html contents in here…

This one was what I am looking for!

Thank you is solved.

Ted

Hi guys! I have the same problem here. My JSON doesn’t have any content. Could you please tell me how do you fix it?

Input:

const json = DOMParser.fromSchema(pmSchema.schema).parse(`<p>Hello world</p>`, {
    preserveWhitespace: true
  }).toJSON()

Output:

{"type":"doc","content":[{"type":"paragraph"}]}

@marijn Could you help me with this?

Thanks. Denys

That method takes a DOM node, not a string, as first argument.

Oh yes, my bad, sorry.