(This builds on the RFC to simplify content expressions)
I’ve had to explain several times that, though content expressions look like regular expressions, they aren’t – you can’t nest them, since they must be a flat sequence of node type sets with an optional repeat operator applied to them, with adjacent sets not overlapping.
The reason for these constraints is that it has to be cheap to match these, and to “make up” a piece of content that fills up the gap between existing fragments in a way that makes the content valid. Especially the latter is hard to do for arbitrary expressions.
But it recently occurred to me that if we make this a regular language and compile the expression down to a deterministic finite automaton (as explained here), matching can still be very fast, and filling-in becomes at least doable — it’d still be a branching search problem, rather than the linear one it currently is, but since I expect most content expressions to remain quite simple and linear, you’d only be paying for that if you use it, and having the automaton as a data structure makes the work that has to be done during this searching relatively cheap.
So in the interest of predictability and power, I’m thinking it might be a good idea to extend what is allowed in a content expression. The downside would be that the library would have to include the complexity that this involves. I managed to write a DSA builder in about 100 lines, but that excludes the additional complexity in the parser (which would have to become recursive). On the other hand, it would probably make some of the other code in model/src/content.js
simpler, so the total cost wouldn’t be huge.
This change could be backwards-compatible, at least as far as schema definitions are concerned, since the new system would accept strictly more expressions than the old system did.
Does anyone have an opinion on whether this would be a good idea?