We have implemented PM with a few customizations and the client loves it.
Then they asked for some fancy sorting rules.
Now we are stuck. I’m looking for recommendations (or even bids) on how to populate a plain text sorting column in the database from the existing PM JSON column. Both are VARCHAR() fields.
Caveat: It is a python backend. If we need node for a one-off conversion I’m all for it - but I will need some hand holding.
Sweetener: We have need of someone to upgrade our PM 0.9.1 system + custom plugins later.
could you give some more details on what you mean by sorting? Sorting documents alphabetically by text content? (I am assuming the PM JSON column contains the serialized ProseMirror documents).
I’m trying to avoid sidetracking this thread with implementation details of the business logic. A short version:
- Some strings have have 'YYYY: ’ text prefixes.
- Some don’t.
- Sort strings into buckets of similar year prefix, plus the odd bucket for strings without string prefixes.
- Sort the contents of each bucket alphabetically, ascending.
- Return contents of the bucket without year prefixes
- Return the now sorted contents of each bucket of year prefixes, starting from most recent year bucket, descending.
But of course - with pagination and database sharding. It can’t be done in the browser. I’ve got the business logic rules done already in python. The problem I have is that many of the string fields are full of serialized prose mirror JSON documents. I need to get as close as possible to plain text for those and going forward I’m willing to have the front end send both the PM JSON blob and an approximation of plain text. The remaining issue is converting the existing fields.
So back to my question: Anyone willing to help a python unserialize the JSON, even if that means helping me set up a node server to get it done?
The JSON has a very regular shape, so run it through a Python JSON parser and then have a recursive function walk over the data, concatenating text if the current object has a
text property, and calling itself on each child if it has a
content propery (an array of child objects).
It’s been awhile and Marijn’s suggestion worked. Thank you!
def content_text(data, text=u''):
"Extract plain text from serialized prosemirror JSON"
if isinstance(data, list):
for child in data:
text = content_text(child, text=text)
if isinstance(data, dict):
if 'content' in data:
text = content_text(data['content'], text=text)
if data.get('type') == 'text':
text += data.get('text', u'')