Converting XML Schema to ProseMirror Schema

adamretter · November 15, 2017, 10:45pm

I thought I would attempt to use ProseMirror to edit some XML. I understand that I need to create a custom ProseMirror schema to represent my XML structure.

I decided to try and make a small tool to convert XML Schema to ProseMirror schema. You can see the current progress here: https://github.com/adamretter/prosemirror-xml-schemagen

My experiments so far have focused on DocBook v4.5 as it is a very complex XML Schema and so will be a challenged, whether in practice it is possible to use such a complex schema with ProseMirror I am not yet sure.

I have a couple of questions:

Attributes. In XML Schema attributes can be optional or required. It seems to me that there is no way to say that an Attribute in ProseMirror schema is optional? It seems that ProseMirror requires an attribute to be completed explicitly by the editor or implicitly by a default value in the ProseMirror schema. So, er, how can I model optional attributes from XML in ProseMirror schema?
“inline”. The documentation for XML Schema is quite basic, and I don’t really understand what the definition of “inline” means. Could someone explain it to me in terms of XML perhaps? Does it mean that an inline node can have text content, or makes up mixed-content, or what?
Marks. I just don’t understand at all what these are for and why I need to have them in the Schema.

marijn · November 16, 2017, 8:20am

Giving them a default value of undefined and treating that as missing seems to work well.

This isn’t an XML concept, it’s an editing concept—inline content is text and things that appear inline with text (as in display: inline in CSS). You can put your cursor in nodes whose content is inline, and type there.

Maybe you don’t. If you don’t understand what they are for you may need to read the docs a bit closer.

adamretter · November 19, 2017, 4:39pm

Thanks @marijn,

So for the attributes, there are 3 cases I can think of, which I have modelled in the following way:

<!-- optional, no default value-->
<xs:attribute name="case1"/>

attrs: { case1: {default: undefined}}


 <!-- optional, with default value for PSVI -->
<xs:attribute name="case2" default="my-default"/>

attrs: { case2: {default: "my-default"}}


<!-- required, no default value -->
<xs:attribute name="case3" use="required"/>

attrs: { case3: {default: ""}}

I just wanted to check that I had the correct approach there?

Thanks also for the explanation of inline that makes much more sense to me now.

Unfortunately I am still struggling a little bit with Marks, even after re-reading the docs a few times. I am not clear on when they need to be used vs nodes in the Schema. If I have an XML grammar which defines a paragraph tag p, an italic tag i, and a bold tag b, I am wondering the best way to model this as ProseMirror schema. If I understand the use of inline correctly, then one possible ProseMirror schema might look like:

nodes: {
	p: {
		content: "(i | b)*"
		inline: true
	}

	i: {
		inline: true
	}

	b: {
		inline: true
	}
}

However, I am wondering how I then get Marks involved, for the purposes of binding the buttons on the toolbar to create bold and italic sections of text. Should I remove the i and b nodes from the Schema and add them just as Marks? If so how do I constrain them to only appear in a p? Or do I keep the nodes and also add Marks alongside them? Sorry I am still a bit confused here…

marijn · November 20, 2017, 10:52am

I think the 3rd case should not specify a default, so that you can be sure that the attribute is always provided.

ProseMirror doesn’t really support nested inline content, so you’ll want to model your inline markup (italic/bold/etc) as marks.

adamretter · November 20, 2017, 12:06pm

So would case 3 just look like:

    <!-- required, no default value -->
    <xs:attribute name="case3" use="required"/>

    attrs: { case3 }

or am I meant to use something like: attrs: { case3: {default: null} }

ProseMirror doesn’t really support nested inline content, so you’ll want to model your inline markup (italic/bold/etc) as marks.

So unfortunately I think that really limits the ability to use ProseMirror for XML. I had hoped to create a semi-automatic tool for translating XML Schemas into ProseMirror schemas, which would give you a base output for improvement.

XML Schema gives the ability to specify that XML elements may have “mixed content” if desired. Those mixed content elements may also have attribute specifications. As I understand it, Marks cannot have attributes. So… that means that whilst I could potentially translate some nesting of mixed content elements into Marks, I would not be able to represent the XML attributes of those elements in Marks.

Do I understand that correctly?

marijn · November 20, 2017, 12:24pm

Not sure where you got that, but it’s definitely not the case.