Break down text into sentences without changing schema

Hello. First time writing on here, so I’ll start by saying thanks to Marijn for the great work.

Onto my question. I’m working on a markdown editor, and would like to implement a focus functionality, a little like what’s done in ghost writer, if you know that editor. The idea being to have most of the text in a color close to the background, and only have the portion of text around the cursor fully visible.

I’ve managed this so far with a plugin that decorates the paragraph containing the cursor with a class, but I can’t seem to be able to do it for sentences. I would like to provide the two possibilities of levels. Sentence or paragraph. I believe the logical way would be to add a sentence element in my schema, but that would require rewriting the markdown parser/serializer as well.

My question, I suppose, is this: Is there a way to go split text into sentences, but only around the cursor? In other words, I doubt Inputrules would do the trick. I’m not 100 percent familiar with the api yet, but I’ve just about pulled out most of the little hair I had left, so…

Thanks

2 Likes

No, I don’t think that’ll work. You can find the sentence boundaries when highlighting, and create an inline decoration across it. That’s probably a more promising approach.

@marijn: Thanks for the quick reply. I agree with what you said and that’s the approach I’ve taken for the paragraphs. Here’s the plugin code that takes care of that (which is shamelessly lifted from the tracking example :slight_smile:) . It’s not perfect, as I’m still encountering errors once I hit enter after that, but that’s an issue I’ll have to fix up later.

new Plugin({
    state: {
        init() { return {deco: DecorationSet.empty} },
        apply(tr, prev, oldState, state) {
            if (tr.selection.empty) {
                const pos = tr.selection.from;
                const n = state.doc.nodeAt(pos);
                const $pos = state.doc.resolve(pos);
                const ps = pos - $pos.textOffset;
                console.log($pos.parent);
                const pe = ps + $pos.parent.child($pos.index()).nodeSize - 1;
                //console.log(ps, pe);
                const deco = Decoration.inline(ps, pe, {class: 'paragraph-highlight'});

                return {deco: DecorationSet.create(state.doc, [deco])}
            } else return prev;
        }
    },
    props: {
        decorations(state) { return this.getState(state).deco }
    }
});

However, I can’t seem to find a way to find the right approach to find sentence boundaries. I assume, unless I’m completely off the mark, that this works because paragraphs are a single node, which isn’t the case for sentences.I know input rules rely on regular expressions, so there must be a way to do that, but everything I’ve tried so far has failed. I’m not asking you to give me the solution, but can you possibly point me in the right direction?

You can take selection.$from.parent.textBetween(0, selection.$from.parent.content.size, "%") to get the whole text of the paragraph (where % will replace non-text nodes so that the offsets still match the document offsets), and then split that on dots or so (which is obviously not a fool-proof sentence splitting algorithm), and use the start and end of the fragment around the cursor (plus selection.$from.start() – the stat of the parent node).

Awesome! Couldn’t get to it yesterday, but I finally did today. I scratched my head quite a bit at first, but with a little squinting at my screen, what you said became clear.

Here is the code I’ve come up with, should anyone find a similar use case in the future. Or should someone point out that I’ve gone about this the wrong way. It’s two functions that return a plugin to add a highlight decoration to current paragraph, and to add a highlight decoration to current sentence, respectively. (current = around cursor position).

Thanks @marijn for the help.

export function getParaFocusPlugin() {
    return new Plugin({
        state: {
            init() { return {deco: DecorationSet.empty} },
            apply(tr, prev, oldState, state) {

                //check if empty selection (cursor) otherwise don't bother
                //if we are at the end of a paragraph, no index is defined, so skip that as well
                if (tr.selection.empty && state.doc.nodeAt(tr.selection.from)) {
                    //get the cursor position
                    const pos = tr.selection.from;

                    //get the resolved position
                    const $pos = state.doc.resolve(pos);

                    //get the start position of the paragraph
                    const ps = pos - $pos.textOffset;

                    //get the end position of the paragraph
                    const pe = ps + $pos.parent.child($pos.index()).nodeSize;

                    //add the decoration between the chosen positions
                    const deco = Decoration.inline(ps, pe, {class: 'highlight'});

                    return {deco: DecorationSet.create(state.doc, [deco])}
                } else return prev;
            }
        },
        props: {
            decorations(state) { return this.getState(state).deco }
        }
    });
}

export function getSentenceFocusPlugin() {
    return new Plugin({
        state: {
            init() { return {deco: DecorationSet.empty, commit: null} },
            apply(tr, prev, oldState, state) {
                if (tr.selection.empty) {
                    const pos = tr.selection.from;

                    //extract text from parent node (paragraph)
                    let txt = tr.selection.$from.parent.textBetween(0, tr.selection.$from.parent.content.size, '%');
                    
                    //get the start position of the parent node (paragraph)
                    const startp = tr.selection.$from.start();
                    
                    //regular expression to test for sentences [TODO: refine to match all use cases]
                    const reg = new RegExp(/[(\.|\?|\!|\n|\r)]/, 'gi');
                    
                    //switch variable for the matches below
                    let match = null;
                    
                    //empty array in which to store matched positions
                    let stcpositions = [];

                    //loop over matches and add the indices to stcpositions
                    while (match = reg.exec(txt)) {
                        stcpositions.push(match.index);
                    }

                    //add the start position of the parent node (paragraph) to all matches elements
                    //add start position to the array, so that the first sentence can be matched as well
                    //add cursor position to the array so that we may filter on it
                    //sort the array in ascending order to have all positions correctly positioned (haha!)
                    stcpositions = [startp, ...stcpositions.map(idx => startp + idx), pos].sort((a,b) => {
                        if (a > b) return 1;
                        if (a < b) return -1;
                        return 0;
                    });

                    //take the two positions immediately to the left and right of the cursor position, which gives us the sentence delimiter
                    const limits = stcpositions.map((p, i) => {
                        if (i === stcpositions.indexOf(pos) -1) return p;
                        if (i === stcpositions.indexOf(pos) + 1) return p;
                        else return null
                    }).filter(d => d !== null);

                    //add decorations between the limits
                    const deco = Decoration.inline(limits[0], limits[1], {class: 'highlight'});

                    return {deco: DecorationSet.create(state.doc, [deco])}
                } else return prev;
            }
        },
        props: {
            decorations(state) { return this.getState(state).deco }
        }
    });
}
1 Like