Potential DOM Parser enhancements

The default from_dom works well for pasting content from arbitrary sources. However because html is the transport mechanism for internal copy and paste, I’ve ended up having to do a lot of tweaking to get things working well with our custom NodeTypes. There are a few patterns I notice myself using and was wondering if they might inform enhancements that would be useful to others.

These issues could also be alleviated if ProseMirror were able to pass around json to use for internal copy and paste rather than relying on html serialization/deserialization, which I believe would be possible with some minor hacks

Nodes that share elements

We have several media elements (image, video, etc) that render as <figure> tags. In order to differentiate between them we’ve been adding data-attribute="*type*" to them. Then the first line in the parsing function is usually something like:

VideoBlock.register("parseDOM", "figure", function(dom, state) {
  if(dom.getAttribute("data-type") !== "video")) return false


I could see others doing something similar but using a class or some other marker in the html. What if you could define a selector to match as an option.

VideoBlock.register("parseDOM", "figure", { type: "block", selector:"[data-type=\"video\"]" })

then when summarizeSchemaInfo does its thing, prior to sortedInsert you could do:

if(info.selector) {
  parse = (function(_parse, selector) {
    return function(dom, state) {
      if(!dom.matches(selector)) return false
      return _parse(dom, state)
  })(parse, info.selector)

Default “block” parse function (and custom named functions)

We added a position attribute to all nodes, and an align attribute to all Textblock nodes. Because of this we’ve had to register custom “parseDOM” methods for all existing NodeTypes we use as well as any new ones we add. It’s not the end of the world, but it would cover most of our cases if we could just redefine the function “block” uses. I could also see it being useful to be able to define new, custom string types that associate to a function.

Supporting a selector filter is a really good idea to avoid boilerplate. If you submit a pull request to that effect, I’ll gladly merge it (try to include docs too).

I agree that the need to define your own parsers for all nodes when you add an attribute is somewhat awkward, but I do feel that handling this abstraction in your own code (a wrapper that you reuse across parser functions) is no more complex than a solution that involves extending the default parsing strategy would be, and I would like, as much as possible, to avoid complexity in the core.

Great, I’ll submit a patch in a bit.

@marijn What are the minimum browser versions ProseMirror targets? Trying to figure out if I need to support the pre-matches “matchesSelector” methods as fall backs (or even a polyfill version)