How should I approach incremental document parser

Gozala · March 6, 2018, 10:05pm

In pursuit of the my goal described in this thread and qouted below for convenience

Goal

I would like to make a markdown editor that has unifies input and preview, such that user can type in plain markdown format, but such that formatting annotations are only visible when caret is with-in the formatting range. Here are couple of examples to show what I mean.

Note: caret position will be indicated via “🖊” character.

Caret is with in the bold text range (** are visible):

writing some **text🖊**

Caret is outside of the bold text range (** fades out):

writing some text over🖊

Caret is with in the heading range (### is visible)

### Title🖊

Caret is outside of the heading range ((### fades out)

Title

text🖊

On the implementation side I end up extending markdown parser / schema to include markup info like **, _ , # in a node / mark attributes. On selection changes I compute “expansion range” by getting marks under the cursor position and then traversing nodes on the left and right until I get to node no longer containing any of the marks present (I really wish there was an API to get mark range) that is because I’d like handle following example:

writing nested mar🖊k example

To expand to following:

writing _nested **mar🖊k** example_

If that’s how actual source of document is.

Once I have an “expansion range” I perform expand operation of the slice of of the document that inserts custom markup marks (** and _) in the example above. Once cursor exits the range (which is mapped on changes) all nodes marked as markup are removed making it appear as regular rich text.

Problem

Initially I thought that I’d use input rules to convert inputs like **🖊 into strong etc… but after using it I realized it’s not going to work as I would like to capture cases where you insert markup in non sequential manner, where you delete markup character, where you paste some marked up text… So I end up going for incremental parsing strategy where on edit enclosing block is serialized (to include all markup) and then original block is replaced with a newly parsed node. Unfortunately I struggle with preserving selection / cursor at right location.

I’m starting to question if this is a right strategy in first place. If there is a better way to handle this I could use an input, if this is a right strategy is there some code available I can learn from.

P.S:

I have considered use of decorators instead of “markup” marks, but I could not figure out how to support deleting markup for example. There are also more complex cases like expanding links to [link name](#url alt) and support edits with-in them.

Gozala · March 7, 2018, 6:30am

On somewhat related subject, I tried using nodes for “markup” instead of marks but was unable to get desired cursor behavior like with inclusive: false, which is unfortunate as with nodes it would be easier to skip them when “collapsing range” or serializing document. Did I overlooked a way of configuring them ?