Make mark and node JSON 'type' serialisation consistent

I’d like to propose changing the mark JSON serialisation to make the way encoding of type consistent with nodes.

At the moment nodes are encoded as:

{"type": "paragraph", …}

and marks are encoded as:

{"_": "strong", …}

Is there a compelling reason for type vs _ here?

I propose changing marks to:

{"type": "strong", …}

This will introduce a potential name-clash problem if a mark has a type attribute, in which case It seems logical to follow the node convention and introduce a attrs sub-object:

{"type": "strong", "attrs": { … }}

Thoughts?

Yes. As you noticed, marks dump their attributes in the same object as their type, and users are likely to need attribute type, whereas an attribute named _ is not expected to be something people use. I’m not sure a vague aesthetic preference is a good reason to introduce another object in the JSON output here.

I guess the crux of my issue is that I don’t understand why there are two different strategies for avoiding name collisions.

For nodes it’s using an attrs object, and for marks it’s using _ for the reserved key. I would have expected that both scenarios would use the same collision avoidance strategy.

Nodes have a bunch of properties (type, attrs, content, marks, text), whereas marks only have type, so I guess that, in my mind, tipped the balance in regard to what format works well.

Thanks for the background on this, it’s really useful to understand the rationales behind the design.

The reason I ask is that we’re looking at formally defining the document structure and storage format for our documents at rest (i.e. in the database) and basing it off ProseMirror’s JSON structure has been a sensible starting point.

Since we have a fresh start, we’re carefully considering the design, challenging our assumptions, and asking questions. So with that in mind, I’m curious if there are any improvements to ProseMirror’s current JSON format that you’ve had in mind to make (but haven’t got to yet), that we might be able to learn from and incorporate into our design?

I don’t have anything about the JSON structure that I’m planning to change at this point. The document structure is pretty much set in stone, and I think the current JSON structure is a solid way to represent that.

I was thinking a bit more about this and identified that the {type: "…", attrs: {…}} shape has the benefit of being easier to express in a staticly typed language. For example with Java’s Jackson you could use a class like this:

public class Mark {
  private String type;
  private Map<String, String> attrs;
}

But with the {"_": "…", …} I think you would just need to decode the entire mark to a Map<String, String>.

So there the problem is that _ isn’t a valid identifier in Java, and thus you can’t use Jackson to represent this JSON structure? Is that a concrete problem you’re having or a theoretical future one?

That shouldn’t be a problem with Jackson, as you should be able to hint the attribute name using Jackson or JAXB annotations:

public class Mark {
  @JsonProperty("_")
  public String type;
  …
}

Instead the problem is that I don’t see how you’d capture all the attributes beside _ and stash them in a map. This is quite an unconventional scenario that I don’t recall seeing any object mapping libraries for statically typed languages cater for.

Fortunately it’s still theoretical (in that we haven’t actually begun coding in Java or Swift), but unfortunately as soon as we do we’ll hit this issue. One of our requirements of a document storage format is easy interoperability with the different platforms in our ecosystem, which boils down to:

  • Java
  • JavaScript
  • Python
  • Swift

So there’s a good mix of statically and dynamically typed languages.

Ah, that makes sense. But would you lose a lot by just treating the whole value as a map and using a dynamic property get to find the type?

It wouldn’t be fair to say you’d lose a lot. You basically lose the ability to statically type-check the type field, and take on the burden of separating the type from the attributes:

String type = mark.get('_');
if (type != null) {
  mark.remove('_');
  Map<String, String> attrs = mark;
  handleMark(type, attrs);
}

As opposed to:

handleMark(mark.type, mark.attrs);

All right, you got it. But if someone complains about the arbitrary breaking of JSON compatibility, I’m going to blame you.

1 Like