Data structure with ids

We have a data structure where blocks need to remain associated with ids (and metadata). Is there a way to keep the edited html tied to a specific id, so we can convert back to the original data structure?

Now we’re using a data-grid-id attribute on each top-level child of our editable. We have a function to make a new id if new blocks are made by the editor.

That’s an interesting use case. What happens if you create a new block? Does it get a new ID immediately, or can that wait until saving? There are very concrete plans to add the ability to modularly define new attributes for node types, but that doesn’t exist yet. So you could kludge something up that attaches ids to nodes after parsing (node.attrs.id = 100), and those will survive editing and stay present in the document. You could even (though again, the API sucks) extend serializers to somehow include these IDs in a given document output format. But to be able to do this in a clean and modular way, you’ll have to wait for the schema work to materialize. I’ll keep this specific use case in mind as I work on that.

It gets a new id immediately (using uuid.v4()). That way, if I paste a URL in that empty p I can convert it to a card with preview of image/video/map/etc.

How does the new schema stuff fit with the ids use case? I see that fromDOM is stripping things not in the schema (that’s good). Where could I allow all top-level blocks to have data-x-id?

I’ve pushed a commit that tries to make attributes powerful enough to express something like ids. Note that the editor itself is unaware of their meaning or their intended uniqueness, and there might still be corner cases where it screws up. See test/test-id-schema.js for an example of how to create a schema that includes such attributes.

Thanks for that example. It seems that with idSchema, things work differently than defaultSchema with blockquote and lists. This test fails:

dom("block_no_p",
    doc(blockquote("thus spake")),
    '<blockquote block-id="1"><p>thus spake</p></blockquote>')

<blockquote>thus spake</blockquote> (no id, no automatic p wrap)


Also this ol test fails:

dom("ordered_list",
    doc(ol(li('one'), li('two'))),
    '<ol block-id="1"><li>one</li><li>two</li></ol>')

<ol><li>one</li><li>two</li></ol> (no id)

let attrPred = (_, data) => data.type.prototype.isTextblock || data.type.prototype.isBlock

here helps… I suppose I’ll have to make a smarter schema to only have ids on the top-level blocks. For now, the extra ids don’t hurt.

Before I get stuck on “how to put ids on editable blocks” are there other ideas for “how to keep arbitrary metadata synchronized with editable blocks” ?

There no easy way to attach attributes only to top-level nodes, since they are associated with node types, and node types tend to be reused at different levels. Though it would actually be possible to create a schema where you use different node types for top-level versions of nodes and nested versions (i.e. top_paragraph can be a child of doc, and paragraph can only be a child of list_item, blockquote, etc).

You could try putting your data directly into attributes, but you’ll have many of the same issues: what happens when the node is split? what happens when it is copied? You no longer have to worry about uniqueness though, which might be helpful.

The tests you show are broken because they don’t match the schema defined. You can’t have text directly in a list item in the default schema, and my example idSchema only attaches ids to textblocks, which, in ProseMirror jargon, are nodes like paragraphs, that directly contain text.

I’ve thought about this some more, and am becoming increasingly convinced that such id tracking is just going to be a poor match for ProseMirror’s data model. I will probably end up rolling back the changes I made for this. Could you elaborate a bit on what you were hoping to achieve? Maybe we can figure out a better approach.

Our data model is an array of objects, like:

[
  { id: 'uuid-0001', type: 'h1', html: '<h1>hello world</h1>', metadata: {} },
  { id: 'uuid-0002', type: 'image', html: '<img ...>', 
    metadata: {
      starred: true,
      isBasedOnUrl: 'http...permalink',
      author: [{name: 'Photog Name', url: 'http...', avatar: '...jpg'}],
      publisher: {name: 'Flickr', url: 'http...'}
    },
    details: {
      src: 'http....jpg', 
      faces: [{x,y,w,h}, ...],
      colors: ['#AA0000', '#00AA00', ...]
    }
  },
  { id: 'uuid-0003', type: 'p', html: '<p>...</p>', metadata: {starred: true} },
]

There is too much data to meaningfully put into the DOM, so I’m trying to just use the id attribute to keep them synchronized. I got pretty far yesterday with the stuff you posted, and think that making a schema with top_paragraph etc. would help us in more ways than just these ids.

What is in the metadata, though? The JSON you showed doesn’t tell us much.

And what should happen when top level blocks are split, joined, deleted, or copied?

All of the the image analysis stuff in the image block details, for example, shouldn’t be in DOM. Prominent colors in image, position of faces detected. And the paragraph’s starred. metadata and details are used more for images and other media, but text blocks can have the same metadata.

Splitting a text node, the top keeps the metadata and the bottom gets a new block and id. Joining, the joined block has the top block’s id, and the bottom block’s meta is gone. Copied, maybe metadata copies but with a new id. Cut and paste, should keep original meta and id.

I reverted the changes I made in this context last week (they were too fragile). I think a viable approach to this would to just crudely attach ids to the nodes you are interested in, and periodically (either after each change or when saving), have dedicated code run over the document to detect nodes that need a new id and nodes whose id got duplicated, and fix them up. Such code would have global knowledge about the document, which the callbacks I was trying to use for this simply don’t have, and thus can perform the task in an easier to understand and more robust way.

Working on this in our homegrown library today, and landed on something like that. When calling our getDocument serializer, I check each top-level element and fix missing and duplicated ids.

Will try the same with my ProseMirror experiment.

Where would be a good place for that hook?

Argh, looks like I reverted a lot more than I wanted. I’ve restored the ability to have parsing and serialization methods on attributes in this commit. You can now do something like…

let nextID = 0
const idAttribute = new Attribute({compute: () => ++nextID})

idAttribute.parseDOM = (dom, options) => options.source != "paste" && dom.getAttribute("block-id") || ++nextID
idAttribute.serializeDOM = (dom, id) => dom.setAttribute("block-id", id)

And if create a schema like this…

let attrPred = (_, data) => data.type.prototype.isTextblock
const idSchema = new Schema(defaultSchema.spec.addAttribute(attrPred, "id", idAttribute))

… that will attach that attribute to all textblock nodes (paragraphs and such – you could also pass a different predicate). When such nodes get parsed or serialized, the attribute methods we defined before will be called to read from or add to the DOM node attribute block-id.

1 Like

This looks good and also comes handy when trying to uniquely identify footnotes. However, it seems that the attribute passing doesn’t take place when copy/pasting directly (when the pasted content is equal to lastCopied), so that ID numbers that are adding to elements will be there more than once if the user copies and pastes contents from one part of the document to another.

Yes, that is exactly what we discussed in this thread earlier. See this reply especially.

I see. I had not understood when that duplication would take place. So it’s directly after pasting on the pasted contents? Any other times?

For footnotes, I think it is OK to use two types of IDs: the visual ones, which can be done with css counters, which always have to be in order and start at 1 with no numbers missing. The browser can take care of that by itself. And then a second identifier which just needs to be unique and available to JavaScript so that footnotes can be attached correctly.

So it’s OK if the footnotes are numbered 10, 11, 12, 2, 3, 9 (in that order). There just cannot ever be two 9 in there.

But I now think this may make more sense for our use case:

As these points:

don’t really apply to us. The footnote marker will be non-editable and therefore cannot split, and when the footnote is copied, it’s fine if it the contents are just copied along with it.