contentDOM when returning a dom node from toDOM()

What do you mean by pre-existing caption node?

If I were to return a DOMOutputSpec instead, I would have to translate the preexisting -node into a DOMOutputSpec somehow.

One way might be to make a wrapper table node which has as its contents “caption tbody”.