RFC: Splitting the library into packages

marijn · June 13, 2016, 2:44pm

The ProseMirror repository does too much, and it has been my intention, at least since half a year ago, to split it up into smaller repositories and modules. Two pieces that are definitely going to be split off are the markdown parsing/serializing and the menu code.

But now that the work for 0.8.0 (which I plan to release before doing any splitting) has greatly reduced the interconnectedness of the existing code, I am tempted to split more aggressively. Specifically, I am debating to create one package per ‘thing you’d import’, rather than distributing, as I am now, a big module from which you include ‘sub modules’ (dist/model, dist/transform, dist/edit, etc). The advantages of that would be:

It feels cleaner to have the library core entirely detached from further incidental complexity
The boundary between the stuff I don’t want to support (for example the default menus) and the actual project core becomes clearer
You’d only be installing the code you need (though in terms of wasted disk space, the additional README/LICENSE/etc files might undo the gains)
Providing pre-built client-side files in NPM packages becomes easier
Only having ‘top-level’ modules without sub-modules removes one piece of complexity (importing submodules).
By moving the default schema into a separate package, possibly multiple packages, it provides a good example of how people who want to provide additional schema elements should structure and distribute them
A lot of tangential stuff could be moved out of the core docs and into package READMEs

There are also downsides:

It might become harder to find the documentation you’re looking for
It might become harder to figure out which module you need
On NPM2 and lower, which doesn’t deduplicate installed packages by default, you’re likely to end up with an ugly fractal tree of node_modules subdirectories
More work during publishing (though tools and scripts might mitigate most of that)
Setting up a trivial editor becomes harder, if you need to gather schemas, menus, etc from various modules. For this, I plan to set up a prosemirror-basic package, which gathers a default schema and menu into a convenient wrapper.

Before I commit to anything, I’m looking for feedback from the community. I have a vague sense that tiny-module projects are often a bit harder to see through (Babel moving to that style was the point where I stopped understanding how it worked and started just googling for examples), but I’m not sure if there’s any problem that isn’t solved by proper documentation.

In the extreme case, the list of modules would look something like this:

prosemirror: The editor component
prosemirror-model: The document model + DOM parsing/serialization logic
prosemirror-transform: The transform abstraction + the basic step types + the primitive transformation functions
prosemirror-ui: The tooltip and prompt helpers
prosemirror-menu: The menu element abstractions, menu bar, and tooltip menu
prosemirror-inputrules: Functionality for defining transforms triggered by typing patterns of text.
prosemirror-markdown: A schema that implements all of standard Markdown, along with a parser and serializer for it, and functionality to help defining the same for extended dialects.
prosemirror-schema-list: Node class for lists, along with commands to manipulate it.
prosemirror-schema-table: An implementation of basic tables, along with commands to manipulate them.
prosemirror-schema-misc: The trivial nodes and marks that make up the current default schema.
prosemirror-basic: A helper that defines a schema similar to the current default schema, along with the menu items, key bindings, and input rules to produce a serviceable editor.

(I could have put every node and mark type in the current default schema into its own module, for consistency, but then we’re at the three-line-module level and that seems needlessly burdensome.)

Some of these (markdown, ui, menu) would end up in their own separate repository. The stuff more or less directly related to the core would be a single repository, versioned in lockstep, using Lerna or maybe custom scripts to automate publishing modules.

How does this sound? Any concerns, or ideas for alternate approaches?

johanneswilm · June 13, 2016, 3:34pm

Splitting of menu and markdown code seems like a plus for us, given that we don’t use these at all.

For the other items the advantage may not be quite as discernable. While the documentation of PM is really good, in many cases I have found myself study the sourcecode of ProseMirror to get extra clarity on how to use a particular feature, and having ti all in various repositories may make it slightly more difficult to find not entirely sure though).

However, while for now we were really all fine with a very general system, starting with tables that may change. I could see some just wanting simple tables and others adding all kinds of extra logic, so splitting seems sensible and would probably work well, at least for our usecase.

frederik · June 13, 2016, 5:17pm

I like the idea of moving menus etc. out of the main repo. Refactoring this in our code seems straight forward.

From a library user’s perspective I like the change, especially for the server side where usually only little of ProseMirror is needed to track transformations.

p.s.: Have you looked into scopes? I think the direct benefit would be to see more easily which are official packages. Install would become

npm install --save @prosemirror/{core,menu,..}

plus it would all end up in one @prosemirror directory in node_modules. ES6 imports then are import .. from "@prosemirror/core"

hzoo · June 13, 2016, 5:37pm

Regarding scopes, https://twitter.com/marijnjh/status/738022062590197760 and https://github.com/pouchdb/pouchdb/issues/5162.

And Lerna’s readme has more info than the current website - I linked to https://github.com/lerna/lerna#how-it-works in the tweet.

kiejo · June 13, 2016, 5:46pm

I like this proposal. I think that moving the ui, menu, inputrules, markdown, and schema code into separate packages will make it more obvious what a good code structure for customizations looks like (this is something I struggled with in the beginning). Additionally, copying and pasting lots of code from the core library in order to customize it, often felt wrong to me. Separating these components from the core library right from the start, will hopefully encourage users to mix and match the existing code from different repositories (without having to wonder if this is actually the recommended way of doing these customizations).

Regarding the editor component, model, and transform code I don’t think the same benefits apply as these components will probably not be customized as often. But for these I see the potential benefit of having more flexibility in which versions of the different packages to use. I could imagine that we would want to use the latest editor component in the future while temporarily staying with an older model/transform version in case of backwards incompatible changes. Having these in different repositories should make this easier. Of course this would only work if the editor component does not depend on the latest version of the other two packages (not sure how probable that is).

I suppose the collab plugin would also be moved into its own repository? In our case this would make sense as we are planning to customize it.

I agree with the downsides you listed, but am optimistic that they can be solved by proper documentation.

frederik · June 13, 2016, 7:40pm

makes sense. Scopes don’t seem to be where they’d need to be for ProseMirror to be a plus.

lessless · June 14, 2016, 3:52pm

The Idea is nice but might be too aggressive - wouldn’t it be nice to extract only “pluggable” parts as the first step? I do agree with @kiejo that “ui, menu, inputrules, markdown, and schema” sounds like really nice candidates.

bradleyayers · June 16, 2016, 6:51am

Sounds good to me — and something that makes sense to try before a 1.x commitment. Using lerna is interesting, we’re beginning to use it at Atlassian in an internal push to decompose some of our monolith libraries.

In regards to the documentation becoming unwieldy, I think it’s definitely something that can be solved — e.g. building a monolith docs from the combined output of all the components.

I really like the proposal of a prosemirror-basic to address the problem of “glueing together pieces is friction”. I much prefer this to glueing everything together in the core.

The issues that I’ve generally found to be burdensome when decomposing into multiple libraries are:

Version compatibility between each library — when you decompose you need to think about (and document) which versions of sibling libraries are compatible (assuming they’re not all in lock-step).
Integration testing between components — it’s one thing to declare your sibling library version compatibilities, but it’s another to actually test them. You can end up with a combinatorial-large set of scenarios to test.
Project management (issue tracking) — if they’re separate repos, you’ll have separate bug trackers for each. It can be extra effort for contributors to navigate around issues and find if their issue has already been raised.

marijn · June 16, 2016, 10:10am

I had an interesting exchange with @hzoo on twitter, where he pointed out that, with lockstep versioning, you cannot make an incompatible change to a minor component without also bumping the version of your core libraries, which seems like a really serious problem. So I don’t think I’m going to do lockstep versioning, even though it does simplify things.

which versions of sibling libraries are compatible

If we stick to strict semver and carefully specify dependencies in package.json, this should be easy to infer (and npm can usually take care of it for you).

Integration testing between components if they’re separate repos, you’ll have separate bug trackers for each

My idea is to have all the core libraries in a single repository and have integration tests for the lot of them in that same repository. They won’t be published with the packages, but will be run all the time during development.

For some stuff that I consider less important, such as the UI code, I want to isolate the noise they generate from the main repository. But I agree the core libraries need a single repository and bug-tracker, if only to make it easy for people to find.

adrianheine · June 16, 2016, 7:57pm

I’m very much in favor of splitting stuff.

Obvious candidates are modules with heavy dependencies. For ProseMirror, only markdown-it does somewhat qualify.

Another approach does apply, though: If you provide extension points and have own implementations in the core package, move them out. There are a lot of advantages:

You prove that your extension point is sufficient for at least your use-case
You prove that your extension point’s stability follows the claimed stability by having correspondingly versioned dependencies
You provide good examples for using that extension point

I would not do lock-step versioning. I would not put multiple npm modules in one git repository. I would not split up along interfaces that you don’t consider stable.

What I thought about recently is not splitting ›at the bottom‹, but rather ›at the top‹: Moving most of the code into an ›engine‹ module/repository and only leaving the wiring in the main module/repository. Like having a CLI script that does nothing but taking the CLI args and forward them to the actual worker.

marijn · June 20, 2016, 11:22am

@adrianheine What are your concerns about putting multiple npm packages into a single repository?

As for a minimal “engine” module, that, it seems, would only work systems that have relatively little knowledge about their client modules, such as a CLI wrapper. ProseMirror’s core is a system in itself, with rather strong opinions on what is going to go on inside of it (and mechanisms that depend on those opinions being respected). For now, I don’t think splitting the editor (edit/ directory) is likely to bear much fruit beyond additional indirection and more interfaces to maintain.

marijn · June 20, 2016, 11:25am

I also noticed last week that once you say that every NPM package only exports a single module, that means some things that would sound like a single package, such as table node types + commands to manipulate them, do have to be split to satisfy dependency requirements. A table schema must be useable outside of the browser, so it can’t import the editor module. A command, on the other hand, will need to interact with selections and other editor-specific things, and thus must import the editor module. Does prosemirror-schema-table and prosemirror-commands-table as separate packages sound excessive?

adrianheine · July 1, 2016, 8:27am

I think the reasons for (and against) splitting something into different packages or different repositories are pretty similar:

If you have multiple npm packages, then it is because you think the different parts can be used and versioned independently. That means they have to have a stable interface, so that they actually can be used from the outside and provide the stability their versioning claims.

If you put something into multiple repositories, one of the main disadvantages is that – if the interface between the two repositories is not very good – you have to do a lot of changes in both repos or even move code between them.

Put together, both types of splitting require good, stable interfaces. I don’t think it makes sense to claim that your interfaces are stable enough to be used by the outside, but not stable enough to split your repository along them.

marijn · July 1, 2016, 8:48am

That’s a good point, cross-package commits should be rare when you’re doing versioning right.

My main concern is having stuff in one place, being able to easily run grep (or git grep) on all the sources in the core, and not having to worry in which repository an issue was opened. I think different issue trackers for the various core components are an unnecessary mental burden – both for people reporting and for people working on the issues. Also, tooling like automatically running tests for all the core packages on each commit is much easier to do when they are in a single repository.

adrianheine · July 1, 2016, 8:55am

In general, you could break dependencies from the command on the editor module by injection or with cheap interfaces. So, command-table has two dependencies on editor: It uses Selection.findFrom, and it takes arguments of type ProseMirror. The former dependency can be turned into one similar to the latter by passing Selection.findFrom as first argument into moveCell. That way, you conceptionally use types from from editor, but have no dependencies on the JS code level.

To get rid of these logical dependencies, you could introduce a (smaller) interface wrapping the editor that schema, command and editor can depend on.

These are just some theoretical considerations, I’m not arguing that we should do this.

adrianheine · July 1, 2016, 9:01am

You could disable issues in some repos, right? Integration tests are more difficult, but running them on each commit does not make a lot of sense with versioning, and unit tests are actually better run in isolation (also, faster!). grepping is an issue. Maybe it makes sense to run some online code browser? Are there any good ones?

kiejo · July 2, 2016, 9:17am

I just read that the D3 JavaScript library has been split into many smaller modules and repositories in the latest release: https://github.com/d3/d3/releases/tag/v4.0.0.

Just thought I´d share this here as a reference, maybe there´s some stuff that ProseMirror can adapt from their approach. I haven´t looked into it in much detail yet.

aslakhellesoy · July 3, 2016, 11:45pm

Have you considered a monorepo with read-only manyrepos approach? I’m managing a fairly large open source project (Cucumber) with many languages and subcomponents, and we’re heading in this direction. Why? Refactoring and versioning becomes a lot easier.

You can learn more about it here:

Aslak

nanzm · December 16, 2022, 9:26am

monorepos are useful， since they are very effective at managing projects with a lot of individual components.