Are you looking to just port the representation part, or also things like transformations? The first is probably not too hard, and has been relatively stable across releases already (minus the linear position change). I’m (finally) ramping up for a 1.0, which is where the backwards compatibility guarantees start, and the point where we could start thinking about writing a spec.
This question made me think more about how best to store the document using a Python backend today if one is going make a release now where users expect to be able to upgrade to future versions.
We are currently storing full HTML of the document every 2 minutes + all the steps that have been sent since the last full update. We use PM 0.7.0 in the frontend and the backend doesn’t understand the data.
This seems problematic, because if the structure of the steps change, and the administrator upgrades their version, then that will destroy the documents that contain unapplied steps that no longer work with the new transformation model.
I am thinking through a few solutions:
A) When the last collab user leaves a document, apply all unapplied steps serverside. Unfortunately this won’t quite work, because even though there is PyV8, it doesn’thave a DOM, so it won’t be able to load the document in HTML format.
B) Create a management page that an administrator can call from his/her browser which loads all the documents with unapplied changes and applies those changes and saves the document. The administrator is asked to do this before updating their server with any newer version.
C) Store both HTML and the PM document format on the server. Use the HTML for now, but for future versions plan on switching to the PM document format. Use PyV8 to apply unapplied steps to the PM document format version only whenever a document is closed.
I’m actually not sure which one makes more sense. It would be preferable though if one could find a solution so that we don’t need to ship older versions of PM with future versions of our software just to be able to read the steps.
@matthieubellon: Is this a problem you guys are facing? If so, what are you doing about it?
@johanneswilm I am not sure to answer properly your question because you raised two problem in fact (storage and collaborative issues)
Storage is a problem we are facing, right now. And we have yet to find the best solution.
At the moment we store our content in plain text (Markdown) which, by far, was a mistake.
We are now moving our text editor from CodeMirror to ProseMirror, and, in the process, we try to define THE correct way to store content.
Our users want to decide when they “commit” changes so we disabled the automatic save every x seconds. We won’t have “live” collaborative editing I think which (again, in our context) brought more UX / technicals issues than users benefit. So we diverge here in our problematic I suppose.
For storage we are studying :
The XML/HTML way: Good because HTML has well defined specs. Kind of bad to express complex markers such as annotations.
The JSON way : Good to express complex structure like annotations spanning over multiple paragraph, overlapping each other, or custom metadata. Bad because not specified at the moment (until PM has Document specs written down).
The Plain Text way : We have done that for 2 years with Markdown. What a terrible mistake I made (I only saw the advantages and stupidly put under the carpet the disadvantages of an unspecified format).
At the moment I am thinking of a custom JSON superset over PM Document format once it will be specified. But this idea has still to be battle tested.
Thanks, that may make sense. Unfortunately, I found a post from 2012 where apparently a jsdom developer claimed that jsdom would not run in anything else than nodejs due to requirejs, etc. . There is a link to Mozilla’s dom.js there, but that hasn’t been updated for 4 years. There is an updated version called domino, but also it is mainly made for nodejs. Different from jsdom, it claims to also run on older versions of nodejs, so that it also should work with Ubuntu 14.04 servers.
So… I will to spend some time figuring out how feasible this is in the short term. Right now I am thinking PyExecJS hardlinked to nodejs + domino may be the way to go.
Yes, indeed. And this also explains why you won’t be looking into translating the transformation code to Python.
Right. However, given that it has to be displayable in a web browser, and that you need to be able to handle paste (?), won’t you have to have some way of serializing it into HTML in an unambigous way anyway?
Taken your comments together with our previous experience of changing filetypes, I am wondering if we simply should save everything in two ways… one that we use now and one we potentially use in the future. That way we minimize the risk. On the other hand, it would be better more ideal to be able to define a migration step on the server once we do change. If we can get access to some kind of DOM on the server without introducing lots of large dependencies, that may be the sanest way of going about this.
Ok, but unless one can run nodejs on the backend, the only choices are porting the transformation code (++) to the server’s language or have a client do the transformations, right? We have been doing the second for the past two years, and it’s working ok, but we’ve had to deal with a lot of edge cases and the code is now so complex, it’s close to impossible to get new programmers to understand all that is going on.
Option A) Right now we send transformations around that the server doesn’t understand, but it can still act as a central authority for distribution, etc. … Every two minutes the clients send in a copy of the full document with transformations applied.
Code already exists
A lot of complexity around having to deal with a server that doesn’t really know the document.
Option B) Port the transformation code to Python.
less traffic (no full documents sent),
server always up to date and a lot of code relatively simpler.
Have to convert and maintain a lot of complex python code.
Option C) Along with each transformation, send a chunk of the document after the transformation has been applied that represents the changes document. Only send full nodes along with information about where to insert it.
Server knows about full current document at any time.
Less code to port and maintain that option B.
Somewhat more network traffic. Some operations like adding a letter will likely not add much extra space, whereas others (make entire document bold) will be as big as the entire document.
Edit: For option C one could use a json diff mechanism available in several languages such as json-delta  to send this type of diff along with the prosemirror steps. This would be a bit overhead because the same information is transmitted twice, but it would have the advantage of always havign a server with a current version of the document.
Thanks! That was also an option I was looking at a while ago. I believe the main reason we chose not to do that was that it would cause quite a bit of overhead to run it in addition to the Python server according to our test results. Additionally, the installation process became quite a bit more difficult. But we don’t have current data.
We have now tried it out for some time, and it is indeed possible to use RFC6902 type tools to send patches back to the server and have python apply it directly. Some negative points about it:
One ends up sending more data, because changes need to be sent in both steps and as RFC6902-compliant patches
Under some situations, ProseMirror takes care of adjusting the document structure automatically without there being a step (notable: a new document is created without contents and PM makes sure it starts with the minimal permitted structure of documents). In such case one needs to make sure to send patches to the server which otherwise will be unaware of this.
Serverside enforcement of partial editing rights seems difficult: It’s easy enough to allow or a prohibit a user entirely from making any changes. But if a specific user for example only is allowed to add comments and those comments are part of the document structure, it’s not really feasible. If the user sends a patch that does not correspond with the steps, there is really no way for the server to notice.
Despite these shortcomings, we will release Fidus Writer 3.3 with this patch mechanism, as it removes a lot of complications we didn’t really know how to deal with.
As for running an V8 process on the server: As far as I can tell, that is problematic because the PyV8 bindings have not been updated for some 5 years and other solutions seem to do a lot of translations that result in slow execution times. This is all not too good when working over websockets and having to do everything in a single thread to ensure that the order of steps stays as it is.
If others here have experimented with another solution that works better – I would be very interested in hearing about it.
I’ve been trying exactly that. I tried PyMiniRacer, which died of a stack overflow seg fault during creation of a small schema, and js2py which died while translating the ProseMirror code bundle.
@elgow Did you see my more recent post on https://github.com/fiduswriter/prosemirror-python ? It’s working - it’s just not that fast. I could see how it could make sense under some circumstances though - for example you might have clients send the full document to the server every now and then and only if all clients disconnected abruptly and you end up needing the full document on the server you use this to apply changes that were not in the last full document update.