Need some help modelling a text generation engine editor

zib · June 11, 2019, 2:20pm

Hello dear ProseMirror community,

I’m trying to create a PM-based PoC for our text report generation engine, which is already in production for several years. I need some help / inspiration in modelling it.

The Engine

The text generation engine is template-based and has simple “rewriting” rules (a.k.a. placeholders), which imply a finite context-free grammar. For one specific placeholder, which rule to apply is directly or indirectly, but only dependent on the external data source (e.g. user input).

Here take food as an analogy:

Data source

Vegetarian: Undefined | No | Vegetarian | Vegan
(If !Vegetarian=Yes) Meat: Beef? Pork? Chicken? Fish? Other?
(If Meat.Other=Selected) Specify other meat: <Text Input>
Vegetables: Broccoli? Cabbage? Cucumber? …
Fruits: Apple? Banana? Kiwi? …

Note that “Meat”, “Vegetables” and “Fruits” are multiple selects.

Engine Output

–Profile–
I don’t eat meat. | I like eating meat. | I like eating beef. | I like eating beef and pork. | I like eating beef, pork and snail.
…
–Recipes–
-Beef dishes- (If Meat.Beef=Selected)
Roast beef [Image Roast beef]
Ginger beef [Image Ginger beef]
…

Under the hood it’s a tree of rewrites:

[h1]–Profile–[/h1]
{MEAT}{VEGETABLES}{FRUITS}
[h1]–Recipes–[/h1]
{MEAT_RECIPE}{VEGETABLE_RECIPE}{FRUIT_RECIPE}

With:

{MEAT} = I like eating
Meat.Beef → beef
Meat.Duck → duck
…

{MEAT_RECIPE} = {BEEF_RECIPE}{DUCK_RECIPE}…
{BEEF_RECIPE} = [h2]-Beef dishes-[/h2][br]Roast beef[image] …
…

That’s a simple example. Our {PLACEHOLDER} may contain inline or block content and supports a very simple markup syntax. The inline content (e.g. a word in a sentence) may cross the boundary between 2 placeholders like {"a"}{"pple"}.

Feature 1: Switchable View Modes

Currently I’m planning for an editor PoC for our template editors and translators. I’m thinking about having 2 view modes of a placeholder:

Placeholder (Atomic): I am {VEGETARIAN}. The placeholder interacts like a #hashtag or @mention.

In-place edit (Inline): The text of a placeholder will be interposed in place. The user may switch between one of the following variants and can directly edit the rich-text in the placeholder.

I am {“vegetarian and I love vegetables”}. (Placeholder output for Vegetarian=vegetarian)
I am {“vegan and I love vegetables”}. (Vegetarian=vegan)
I am {“a meat-eater”}. (Vegetarian=undefined or no)

So the user can switch the view to {PLACEHOLDER} or different {“in-place”} variants while the rest of the editor stay the same. The switch operation alone should not create an undo history (from the user perspective) because the underlying data is not changed.

Feature 2: In-place Edit and Change Propagation

In the in-place edit view if there’s a text edit on the selected variant, the changes could be optionally propagated to other selection variants, when part of the changed text also exists in that variant.

Here an en->de translation example, the user edits on the in-place variant "Vegetarian=vegan"

I am {“vegan and I love vegetables”}. → Ich bin {“Veganer und ich liebe Gemüse”}.

The change will be propagated to "Vegetarian=vegetarian"

{“vegetarian and I love vegetables”} → {“?vegetarian? und ich liebe Gemüse”}

But not to "Vegetarian=undefined or no"

{“a meat-eater”} → N/A

Here "a meat-eater" is not changed since it doesn’t match the sentence being edited. In the background a simple natural language comparison algorithm determines the rules for the propagation. The UI can list all variants of this placeholder and allows the user to revert the propagation (“Don’t change other variants”).

Feature 3: Mostly natural language related

There are also processes which are not aligned to the boundary of this placeholder model (similar to the PM class Slice vs. the Node/Mark model). For example we do named-entity (also called term in the translation industry) recognition and tagging, which operates on the word level, but one word may actually cross placeholders as I mentioned before, e.g. {"A"}{"pple"}. Linting on sentences is another important use case.

Your opinion?

I see many similarities in the data model of PM as it also operates on a tree (the DOM). Our placeholder in the in-place view for example is very much a PM Mark. I’m considering first giving it a try here. For the “switchable view modes” of placeholders and the “edit propagation” I probably need to shadow their PM nodes?

Since I’m quite new to PM, I’d like to ask your opinions on how to better map our models to PM and how to manage history for the “switchable view modes” feature. What level of customization do you think would be a better approach?

Thank you!

marijn · June 11, 2019, 6:39pm

Hi and welcome. Yes, I think you should be able to model documents like that in ProseMirror, but you probably will have to write a lot of custom commands to implement the interactions around tree manipulation. Unfortunately, I’m too busy to dive into your design very deeply or give more useful feedback, though.

zib · June 13, 2019, 11:34am

Thank you for the prompt answer!

I will experiment with this idea this week and update my progress in this thread.

dbousamra · June 28, 2019, 1:07pm

@zib How did you go? I am about to tackle a very similar project. Mine revolves around generating medical reports, not dissimilar to invoices, from some structured data.

zib · July 13, 2019, 9:48am

It went on really well. I had a lot of fun modelling with ProseMirror. To my understanding there are several ways to achieve this with ProseMirror. I managed to build a prototype in a quite rough way:

Custom node schema for “Replaceable” (the placeholder) and “Rewrite” (the real content), they each have an “id” node attribute;
Toggle attributes / class with either setNodeMarkup (mutates the doc) or Node Decoration (no mutation);
Manage most of the real “view” features with pure CSS (similar to the footnote counter in PM footnote example);
(Not tested yet) Create a “reserved area” in the same doc and move “Rewrites” in and out of this area to keep the main doc linear.
(Not tested yet) Keep track of all the nodes with “id”

The idea is to make as few customizations as possible. My “placeable” and “rewrite” are PM inline nodes at the moment. I use styles to display them as blocks and lists.

It’s tricky to build a mixed inline/block (in the PM model sense) solution. I’ve no idea how to glue 2 block nodes in a inline manner, even out of PM context with HTML + CSS. Maybe it’s easier to go one level lower and model them as PM Slices.

To my surprise ProseMirror handles selections and cursor quite well despite my hacky schema and styles. Anyway the cursor, copy paste and dragndrop behaviors are not 100% correct out of the box. I’m trying to fix them with custom node views.

I can put more thoughts and details here later.

@dbousamra May I ask where you are based? Maybe we could have more discussion on it…