Discussion: Inline nodes with content


Issue #114 is one of the oldest open issues on the tracker now. I’ve been meaning to look into it ages ago, but other things kept getting in between.

The problem is this: A lot of the code in ProseMirror assumes the document structure to be roughly a tree of blocks, of which some leafs are textblocks that may have inline children, which are always leaf (child-less) nodes.

But there’s nothing about the way ProseMirror schemas are defined that actually enforces this. In principle, all nodes can have children. This topic is about the question of how to deal with inline nodes with children.

The two main use cases for this kind of structure that people have presented so far are pieces of text that have a different meaning (such as @-mentions, hashtags, or other autolinked syntax), and footnotes that are written inline with the content around them.

I spent some time today experimenting with this. After changing a lot of uses of the isTextblock property of nodes to use the (new) inlinContent property instead, and fixing a few other places that used bad assumptions, I have an inline node that contains text roughly working, but the following issues came up:

  • Browsers normalize cursor positions on the boundaries of inline nodes to a single position. I.e. you can’t set the selection directly after the end of an inline node, since the browser will normalize it to the position before the end. (Depending on the browser, some seem to normalize the other way.)

  • Drawing of cursors on such boundaries is also rather glitchy. I.e. the DOM selection might report the cursor being inside the node, but the actual cursor gets drawn outside of it.

  • If you want to treat the content of such nodes a lot like normal text, most of the issues that originally motivated me to use a flat representation for inline content come up again

    • What does it mean when the footnote node as a whole is marked bold? And when it is, some text inside of it could be marked bold again.

    • What should delete/backspace around the edges do? I expect most people will want it to do what it would do in flat text (delete the character before/after), but making that work across nested nodes is quite awkward.

    • Should the cursor still be considered as being inside the grandparent textblock when in such a node? (For example when determining the effect of changing the block type to a heading.)

    • When inserting text or pasting at the boundaries of such a node, should the content be included in the node or not?

The DOM selection issues could be partially avoided by saying that, unlike block nodes, the start and end of inline non-leaf nodes don’t get separate cursor positions. I.e. there’d be, in the selection/position model, no difference between being directly in front of such a node or directly at its start (and similar for the end). But that produces a lot of ambiguity around the behavior of editing actions in such positions, which is also not ideal.

Furthermore, I believe there is a lot of value in having a representation that closely corresponds to the thing it should represent, and makes implementing commands and keeping things consistent straightforward.

Thus, since the two use cases we’ve got so far (mentions, footnotes) both seem like they’d want to treat these nodes as much as plain old text as possible, maybe we should explicitly disallow non-leaf inline nodes and look for a different solution. If something is supposed to be inline, it probably should behave like other inline stuff, and be represented in the same way.

What we currently have in this space are marks, bits of data associated with stretches of inline nodes to make them emphasized or linked or whatever. There are a few problems with trying to use marks to represent @-mentions or footnotes:

  • Pasting into them or pressing enter inside of them splits them, since they are simply associated with the text, and that text is allowed to move anywhere.

  • They can overlay with all other marks, and contain the exact set of node types that the parent container may contain, so you can’t constrain what kind of content they cover.

  • They are split into pieces when drawn, one per child node. So styling them with a border, padding, or border-radius isn’t going to look good.

It might be possible to add a feature to the schema system that addresses those last two issues—though I haven’t quite worked out how it’d work yet. This’d be less ‘regular’ than simply reusing the tree structure for such a feature, but, again, there’s a number of good reasons to prefer flat structure for paragraph-level content. Before working on that, however, I’d like to look into the first issue, and I’d like to ask around if someone has another use case.

So, what should happen when you press enter or paste a bunch of random stuff in the middle of a mention, hash tag, or whatever special marked-up elements your app includes in the document? How about a footnote? Is editing footnotes inline a good idea in general?

Do any other things that you’d want to represent as an inline node with content come to mind? Do these fit a text-like editing method, when it comes to changing their content, or are they different?

Are you aware of an implementation of highlighted syntax elements or of inline footnotes that works well, which we might take inspiration from?

(cc @frederik, @johanneswilm, @bradleyayers, @rsaccon, @bolerio)

Nested inline nodes
Selection in node view

Not currently tackling these use cases but in general extending the marks system to allow marks to define exclusivity with other marks would be helpful. For example links, code snippets, and @mentions may want to disallow any other marks from occurring within…

Pasting things within an @mention seems like an edge case where I’d be ok if it converted it all to text OR broke it up into several nodes, I don’t expect users to do this.


Thanks for taking a look into this, the summary is excellent!

Cursor boundaries Confluence adds a zero-width no-break space on the boundaries of @mentions to add another cursor position at the same location, to disambiguate the semantics. Here’s an example where I step through the characters at a constant rate using the arrow keys.

The HTML for this is:

<p>I am &#65279;<a userkey="ff808081560b912f01560c40d04b000b" href="/wiki/display/~bayers" class="confluence-link">Bradley</a>&#65279; from Australia.</p>

Another example is a date:

<p>Today is &#65279;<time datetime="2016-11-15" contenteditable="false" class="non-editable" onselectstart="return false;">15 Nov 2016</time>&#65279;.<br></p>

I was thinking about this yesterday, and I suspect this might be satisfactory. In all the use cases I have in mind, the inline node is plain text, but has some special presentation and semantics applied to it. If it was possible to extend Text in order to restrict the marks that can be applied (in my cases so far I would disable any marks), and hook into selection events to present hover mentions, that would get me a long way.

The situation today seems to be that I could make do with marks, but wouldn’t get any help from the schema to achieve it. I’d need to intercept all commands and prevent any that would violate my constraints. This means that plugins that add support for adding other marks would have no way to tell (by looking at the schema) that they should prohibit it.


That’s an interesting trick. It’s not ideal, since it requires a kludge in the rendering of such nodes, and another kludge when parsing them to make sure we don’t treat those characters as actual content, but it might be an option. (I tried doing something similar with :after/:before styles, which would be less invasive, but that didn’t work.)

Thanks for the feedback so far. I’m putting this aside for a bit as I work on other things, but further comments are definitely welcome.


Have there been any new thoughts on this issue?

I’ve been revisiting content storage recently, and we’re evaluating whether just storing the ProseMirror structure (probably just via Node.toJSON) is viable. On the whole it seems okay, but there are some issues that make it inconvenient.

For example when expressing hyperlinks as marks, the text may be split (e.g. if half the text is bold). This means that the “unflatten” logic (i.e. the order of the marks in the schema) needs to be encoded into every tool that operates on the stored content.

Using a tool like jq to “find the text for each hyperlink” isn’t going to go well, since it doesn’t know that it should be merging together sibling text nodes that have a “link” mark with the same values in attrs. For these types of operations, having a true tree structure makes it much simpler.

A similar challenge will occur for server-side rendering code that needs to turn a document in another format (e.g. HTML or PDF) as it’ll need to understand how to merge text nodes together, based on the ordering of marks and equality of their attributes.

Being able to render and search quickly make representing a document as a tree very attractive. But it would nice to avoid an addition layer of translation and introduce distinct concepts of “editor format” and “storage format”.

I’m interested in anyone’s thoughts on this topic :slight_smile:.


Sorry, I hadn’t seen this post before.

From the perspective of Fidus Writer: We were the ones needing the footnotes. The idea of having them be inline came mainly from experience with LaTeX. Havign footnotes be defined inline solves some issues (for example, copying them in a copy-paste operation works automatically) and creates others (such as the ones you mention).

Back then I followed Marijn’s advice and created a second editor for the footnotes and added a lot fo extra logic to make sure they are always placed in relation to their reference while not overlapping, etc. . This code is working, but the overall complexity of it all (two editors, collaboration provisions when steps don’t arrive, etc.) means that it is quite a lot of work to maintain it. We are still on a slightly hardened 0.10, because it’s simply too much work to update it all.

So in this sense I would vote for either keeping it all as it is in order to not create even more breaking changes, or to switch to a radically simpler model that will be able to handle editable inline nodes out-of-the-box and won’t require individual editors to maintain a lot of code on top. What I would be against would be to switch to another model that is theoretically cleaner, but ends up requiring the same amount or more code in the editor and adds a lot of new breaking changes.

From the perspective of the W3C editing taskforce: The nornalization of caret positions/selecctions is a problem we have been looking at for a while. The argument by several browser manufatcurers has often been that it’s difficult to change any of this, because various JS editors likely depend on the current, broken state. There have been several proposals to this dilemma, but the most recent one is that one creates a new element that is the successor of contenteditable, that will be versioned. So basically you can do something like:

 <editable version="1">...</editable>

There should be a new version fairly frequently with new fixes in every new version. If someone relies on the bugs still present in version 657, they simple set that version. If one doesn’t set a version, one always gets the newest.

If you think that sounds like a good or bad idea, please let us know!


Thanks for the insight to the W3C editing taskforce direction — I’d like to follow along with that more closely.

On face value I’m skeptical that versioning an element is going to end up solving more problems than it creates, but I’d need to read through the exact proposal before having a strong opinion. I’m wary on the typical scenario where a version number is more of an aspiration of functionality, than of true conformance. (put plainly — I don’t think a version number will mean the same thing on all browsers).

The whole issue reminds me a lot of CSS and the proposal of Houdini, and I wonder whether an approach inspired by that wouldn’t work better here. An <editable> or contenteditable element doesn’t seem like the direction we really want to go, but it’s just something we exploit to get access to rich user-input APIs that feed into a JavaScript model that is then able to update the DOM. Having document integrity enforced by a model in JavaScript seems like the direction everyone wants to go. I wonder whether looking at what problems contenteditable solves regarding user input (e.g. pasting, cursor navigation, …) and building first-class APIs to address each specific problem wouldn’t be a better approach.


I haven’t published any drafts yet. So your ideas are very much appreciated both now and once there is a draft.

The general problem is that contenteditable is very very broken and has been for a long time. All attempts at solving it “once and for all” seem doomed because every time we seem to find a solution to one aspect, we discovered that there are a thousand more issues. So instead of saying we cannot fix anything before everything is fixed, we came up with this which does seem reasonable. The editing taskforce has been operation now for about 3 years and we have only barely scraped the surface of some of the main painpoints that have to be solved on the browser side. So if we do everything in one go, we may have to wait another 30-40 years before we can ship. That doens’t seem like a reasonable timetable.

The wish for a special element came from the side of browser developers. This came up in connection with us talking about the need for a way to disable certain operations on a specific contenteditable container. Having a way to disable certain edit operations is needed mainly because the Safari team only wants to disable items in their native browser UI for editable if the edit action is disabled entirely in that contenteditable element.

You may say that this makes no sense, as no JS editor cares about whether or not the contenteditable element has an attribute that disables bold and italic as the JS will just do whatever it wants anyway. Possibly yes, but this is not how everyone sees the world, and having this global way of disabling features so that they will disappear from the browser native menus is the lowest common denominator.

Why does this trigger a switch from atribute to an element? Apparently it’s complicated for browsers to treat an element completely differently due to an attribute. This sems to be especially true if you have to look at two attributes to determine what is going on. This of this:

<div contenteditable="true" disablededitingactions="underline fontcolorchange lists">...</div>

I was there at one of the first meetings in Sydney. I agree that it would be much better to just expose primitives to be engaged with with JS. But this is not how all the browser people involved with this look at the situation. In the view of some of them, the browser provided UI should be the principal way of interacting with a text editor on the web, and JS editors that implement their own logic are destroying the user experience…

Anyway. drop in to the email discussions or to the next phone call we will have if you feel like getting invovled in years of discussions that may or may not lead to tangible improvements for JS develoeprs some day. :slight_smile:


Yes, though no definite decision yet. My current thinking is that I will

  • Add some way for marks to have more control over which marks they can coexist with (to allow multiple instances of a given mark type at a single spot, or forbid certain marks from occurring together) to address some simple use cases.

  • Require that inline nodes with content are rendered with a custom node view, as a way to put the responsibility of hacking something together that happens to work for a given use case toward the client code. So if you want a nested structured node in your inline content, sure, but you have to kludge together the editing interface

  • Make it easier to use nested ProseMirror instances in custom node views (primarily by providing a way to map steps in a given local node that is the top level node of the nested editor to steps in a wider document)

This isn’t going to solve everything, but I think solving everything may be out of reach, and it’d already be great to have a working approach for these kinds of situations.

Doesn’t a tree-based representation suffer from similar issues? Unless your link mark is the lowest-precedence mark in your schema, it risks being split by other, lower-precedence marks.


We would have some more use cases:

  • Keywords: Each keyword is displayed in an uneditable “pill” but after all the keywords in one line there needs to be the possibility to add a new keywords by typing text after the last such pill. Ideally, when the caret comes close to it, the pill turns into editable text. Think of a field such as:

Keywords: Home, Kindergarten, Fish

This could just be a simple text node where the keywords are extracted by separating at the position fo the commas, but if one does that, one can almost be certain that some users will use spaces or semicolons instead. Some will write “and” in-between the words, etc. So the oills seem a safer bet. But as far as I get, there is no way of making a schema-declaration for this currently, right?

  • A line with an arbitrary number of First/last name “pills”. Think of an article that has a title and an authors line:


By Hans Bild (University of Washington), Line Weingaard (US State Dpartment), Dr. George Kufy (VIetnam-US friendship society)

The editor needs to make sure that the by lines follows the same format in all documents and the program needs to be able to extract all the names and institutions from that line. There may be several ways of doing it. The one that comes to my mind is to have an arbeitrary number of inlinw “pills” where each pill represents one person and placeholder textss are used to guide the user in typing the names correctly. Alternatively, The entire line is non-editable and a separate UI is used to insert structured data.


I had come to the conclusion that being able to have inline nodes with content would be sufficient to solve this (assuming that the inline node content had the same content spec expressiveness as text blocks). However I do have a question about how how you see inheritance of marks in the content specs working.


  nodes: {
    paragraph: { content: 'inline<italic>' },
    text: { group: 'inline' },
    link: { content: 'text<strong>', group: 'inline' }

I’d expect the following:

  • Does a text within a paragraph allow the italic mark? — yes
  • Does a text within a paragraph allow the strong mark? — no
  • Does a link within a paragraph allow the italic mark? — no
  • Does a link within a paragraph allow the strong mark? — yes

Is there a scenario where supporting different sets of marks without employing an inline node with content is a scenario worth supporting?

This sounds very reasonable to me.

I’ve been working under the principle that nesting ProseMirrors should be avoided in favour of a single ProseMirror. Can you point me to the background for why nested ProseMirrors is an approach that should be embraced?

Rationale for marks

Having the same mark on the same text twice with different attribute values would be essential to us. We are currently waiting for this from upstream. But should you decide not to do this anyway, let us know, then I’ll have to hack something on top of the existing code so that we can put several reference IDs into the same mark. Then we just need to figure out abotu a way of removing all references to one particular ID in all the marks within a certain range, or to add a new reference ID to all marks within a specific range. This will be a bit messy, so having direct support for multiple marks would be preferable.


I don’t know yet – I haven’t gotten around to working on that.

When the editor for a given node doesn’t look like inline editable content (say, a tooltip that allows one to edit a footnote after clicking on a placeholder that indicates its position), using a separate ProseMirror instance is preferable. If transactions from that instance can be mapped onto the outer editor, that makes propagating changes and supporting proper collaborative editing inside such content really easy.


Interesting! Could you share what your use-case is for this?


We need to reference comments by several different people on the same piece of text. Marked ranges/decorations don’t work for us for several reasons and the comments really are part of the document – we even need to convert them when converting the document to other formats.


I’m curious about the current feasibility of this, since this is exactly what I am trying to do!

My use case is wysiwig math editing. I’m trying to hook up http://mathquill.com/ as a custom node view, on top of an inline node with content of a single text node. The content node in this case contains the latex representation of the math.

I’ve run into a number of issues while doing this, and I’m wondering if I should report the issues I am seeing and try to get it working, or whether it’s not reasonable to attempt at this time. In that case, I’ll probably convert it into a leaf node and use some other mechanism (maybe attrs?), rather than a child text node to store the latex.


This is something that’s still shaky. I wrote an example of editing inline footnotes by showing an editor in a tooltip when they are selected (which only works with the current master branches, not the latest release), but your use case is different again (the editor is always visible, and I guess you want to move the cursor into the inner editor when ‘arrowing’ through). I’ll try to allocate some time to create an editor like that soon, and see how it goes.


I have the same use case and thought about the same approach. Unfortunately I didn’t have the time to try this yet, but it would be really nice to be able to handle such a use case using a custom node view like you describe.

That’s great to here, I’m very interested in seeing how this works!


Cool! Are you thinking about targeting the math editing use case specifically? Or something similar?

I’d like to be of assistance in any way I can! For now I’ll just keep reporting issues as I find them :slight_smile:


I have a use case for supporting @mentions like the following:

Does anybody have an example of that use case that I could look at? Or maybe some direction on how I could accomplish this?

Thanks for any and all help!

// Nick