How do I modify content before its parsed?

ElHex · May 1, 2023, 10:46pm

Hello!, I’ve been loving prosemirror and I would deeply appreciate any help you can give me! So I’ve been modifying lists to be close to what google docs / word have and I noticed that they do not allow any \t characters at the start of a li item inside a list, so I’ve been wanting to just delete the first (or many \t’s) before any character when parsing the document.

My best guess was that I need to use parseHTML function:

    parseHTML() {
        return [
          {
            tag: 'li',
            getContent: (node, schema) => {
                const test = DOMParser.fromSchema(schema).parse(node);
               // const textNode = schema.text(node.textContent || '');
                //const fragment =  Fragment.fromArray(test.content);
                return test.content;
            },
          },
        ]
    }

This is what I have, I’ve been trying to return a copy of a fragment (for example test) but it seems to modify it, it must be using a transaction right? so I was thinking maybe modifying the HTMLNode (node in the code) then allow prosemirror to do its thing, but I don’t know if what I’m doing there is ok

DOMParser.fromSchema(schema).parse(node);

because this seems to return a doc type, basically a new doc ready to be inserted in the document, but I just need the contents of it (in this case the li)

is this ok?

marijn · May 2, 2023, 7:20am

Leading whitespace in block nodes will be removed by default unless you pass the preserverWhitespace option.

What kind of parsing are you trying to adjust? The initial document parse? Pasted content parsing?

ElHex · May 2, 2023, 7:54pm

Sadly @marijn for me the leading whitespaces are not removed, I’m using tiptap on this If I have a structure like this.

and then I apply the list, for example bullet:

It does have the \t character on it, I was looking for a way to have this removed automatically (only the leading whitespaces) that’s why I thought of parseHTML but yesterday after many tests I noticed this only runs when the content is set, not on created dynamic content

I was planning on maybe add this in a event like onUpdate or something like that, what do you think?

marijn · May 3, 2023, 6:40am

This sound like you are expecting a paragraph that already exists in the editor to be parsed differently when you wrap it in a list via an editing command? That’s not really how this system works—but if you use a custom command for the wrapping I guess you might be able to have that modify the wrapped content or something.

ElHex · May 3, 2023, 7:01am

Exactly. I just figured out that I need to clear the content I need removed (in this case leading whitespace) and then add the list, thanks so much for your help @marijn.

but I’ve got another question extremely related to this:

this is the command I’ve developed

        tr.doc.nodesBetween(from, to, (node, pos) => {

          if (node.isTextblock && node.textContent != null) {

            //if (!node.isTextblock || from === to || (node.isTextblock && node.textContent == '')) return;

            const { textContent } = node;

            const match = /^\s+/.exec(textContent);

            if (match) {
              const newText = textContent.slice(match[0].length);
              const start = pos+1;
              const end = start + match[0].length;


              commands.deleteRange({ from: start, to: end });

            }
          }
        });

Thing is the moment the first loop comes in, deletes the range, the next loop will be deleting parts that should not be deleted, I guess this is happening because the position changed and the state too

nor I don’t know if maybe I add a counter to run this loop how many times I need, then restart this loop as much as I want, I don’t think its is ideal performance wise right?

I don’t know if I’m using this function correctly, is there another way I can get the updated state or restart this loop?

marijn · May 3, 2023, 7:59am

See position mapping for that.