Transforming a inline HTML output to Block

gethari · January 4, 2023, 12:26pm

We are in the process of migration from Froala → Prosemirror. In Froala most of the tags are nested inside the  tag. Here’s what I am trying to do in a most simple manner

<p><span class="fr-pdf fr-deletable"
         contenteditable="false">
      <object data="https://www.africau.edu/images/default/sample.pdf"
              width="640"
              height="320">
         <embed src="https://www.africau.edu/images/default/sample.pdf">
      </object></span>
</p>

This is our current froala output from the Editor when we have a PDF inside. I want this to be made as block and inline:false in our new prosemirrror schema. But I cannot simply mark it as block node as it prevents the rendering as the input HTML is not a block node.

Any thoughts ? or does anyone have experience dealing with this ?

marijn · January 4, 2023, 1:22pm

I do not know what this means.

gethari · January 5, 2023, 4:26am

I have created a CodeSandbox to explain my problem better. Please take a look once you get a chance to do so.

BTW, Thanks for your awesome work, we are loving Prosemirror as well as Tiptap

marijn · January 5, 2023, 6:07am

But that just shows code where you set the node to be inline. It sounds like you don’t want it to be inline. What isn’t clear is why you don’t make it a block node then.

gethari · January 5, 2023, 6:19am

If I make it as block and set inline:false, I do not get the output, (a.k.a), nothing is rendered in the Editor.

when I make inline:true I see this output in the Editor.

Please correct me incase of any misunderstandings, thank you for the quick response.

marijn · January 5, 2023, 8:19am

I think this is because of your odd HTML structure. The parser first sees the  element, and creates a paragraph node for it. Then it finds the  that holds the PDF, but it cannot place that in its current paragraph context, so it drops it. Reformulating your parse rule to target the  (and check whether it holds only a PDF span in getAttrs) probably helps.

gethari · January 5, 2023, 10:42am

Once again, I thank you for your response, I think I might have got it working. I am just summarizing my understanding to just double check if I am doing it right & will also help other people who come searching similar query in future.

Since my existing HTML output from froala has the PDF 1 level inside the  tag, & I want it to be treated as block node in ProseMirror
I extended the Paragraph extension to check whether the first child of the  tag is a PDF using the parseHTML() rule
I also added an attribute called isPdf in the extension to output renderHTML() / toDom() conditionally

parseHTML() {
    const getPdfAttribuesFromFroalaHTML = (element: HTMLElement): ParsedElementResult | boolean => {
     // Your logic to find if the node is a valid PDF document
        const result = {
          height: height,
          data: src,
          width: width,
          isPdf: isPdf,
        };
        return result;
      }
      return false;
    };
    return [
      {
        tag: "p",
        getAttrs: (element) => {
          if (typeof element === "string") return {};
          return getPdfAttribuesFromFroalaHTML(element);
        },
      },
    ];
  },

Now after parsing it successfully in order to render it

 renderHTML({ HTMLAttributes }) {
    const isPdf = HTMLAttributes.isPdf;
    if (isPdf) {
      return ["object", { ...HTMLAttributes }];
    }
    return ["p", mergeAttributes(this.options.HTMLAttributes, HTMLAttributes), 0];
  },

Since the above output is rendering an object tag to hold the pdf which will be my new PDFExtension to handle further renders.

export const PdfExtension = Node.create({
  name: "pdf",
  atom: true,
  selectable: false,
  draggable: false,
  inline: false,
  group: "block",

  addAttributes() {
    return {
      data: {
        default: null,
      },
      width: {
        default: DEFAULT_ELEMENT_WIDTH,
      },
      height: {
        default: DEFAULT_ELEMENT_HEIGHT,
      },
      type: {
        default: "application/pdf",
      },
    };
  },

  parseHTML() {
    return [
      {
        tag: "object",
      },
    ];
  },

  renderHTML({ HTMLAttributes }) {
    return ["object", { ...HTMLAttributes }];
  },
});

Please modify/suggest any improvements if any.

marijn · January 5, 2023, 11:00am

I don’t think it’s necessary to use a single node type for PDF and paragraph. As long as the parse rule for the PDF node has a higher precedence than the one for paragraph (and checks whether the  element has the shape expected for PDF nodes), they can be entirely separate node types.

gethari · January 6, 2023, 4:44am

Thank you, I’m just having this thought please share your suggestions if any

A document might have more number of normal content  Hello  rather than the pdf tags that are inside . So when we target  in the pdfExtension with high priority is there any chance of performance hickups when there is alot of content ?

A collegue of mine suggested that how about, we set highPriority for the normal  extension and check if there are no child elements using the getAttrs ? if not it will fallback to the parse rule inside the pdf extension which checks if there is pdf inside the p tag.

Any thoughts ?

marijn · January 6, 2023, 7:06am

I can’t see that mattering at all. You’re going to have to do the check for each paragraph either way, and the check can be really cheap.