Migration from HTML with special content

I have a function that parses HTML content to JSON model based on schema (got the reference from tiptap).

(...)

export function generateJSON(html: string): Record<string, any> {
  const dom = parseHTML(html) as any;

  return DOMParser.fromSchema(schema)
    .parse(dom)
    .toJSON();
}

It works fine for most of the content we have in our database but, we have a very special case which we are trying to solve. Basically we save some sort of “placeholders” or “formulas” with our HTML, these placeholders are then replaced by actual react components during the render. Here is an example of how the HTML is stored.

<p>Normal <b>Paragraph</b></p>
{{FORMULA_CODE_CHART(COMPANY("12221"))}}
<p>Another paragraph</p>

Basically, the idea is to transpile this string identifying these placeholders and returning specific block type for it.

Something like:

{...},
{
  type: 'dynamic',
  attrs: { type: 'codeChart', companyId: 12221 },
}
{...},

I know we could just use regex and replace these code by a DOM element, maybe? Or I wonder if there is a better approach using prosemirror API to identify these pieces of code as a block type during the transforming.

Thanks in advance!

Pre-processing these on the text level to replace them with HTML elements seems like a reasonable approach. The ProseMirror DOMParser doesn’t do any kind of text matching.

1 Like

Thanks for the reply @marijn