[PUBLISHING-134] orderSiblings causes the nodes ordering process took minutes to finish Created: 07/Mar/22  Updated: 14/Sep/23  Resolved: 14/Sep/23

Status: Closed
Project: Publishing
Component/s: None
Affects Version/s: None
Fix Version/s: 1.3.10

Type: Task Priority: Neutral
Reporter: Minh Nguyen Assignee: Unassigned
Resolution: Fixed Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File extract-2022-03-08T05_04_52.914Z.csv     PNG File image-2022-03-07-22-27-38-193.png     PNG File image-2022-03-08-12-10-06-998.png    
Issue Links:
Relates
relates to MAGNOLIA-9053 Improve scalabity of publishing content Open
documentation
documents MAGNOLIA-8880 Performance problems with too many ch... Open
Template:
Acceptance criteria:
Empty
Task DoR:
Empty
Date of First Response:
Visible to:
Annick Boehler
Epic Link: Performance problems with too many child nodes
Team: Nucleus

 Description   

Steps to reproduce
1. Create a groovy script to create ~15k nodes under root /
Example: create new workspace profile
/profile1
/profile2

....

/profile15000
2. Make sure orderSiblings=true on author and public instances (/.magnolia/admincentral#app:configuration:browser;/modules/publishing-core/config@orderSiblings:treeview

3. choose 1 profile to publish. It will take minutes to finish
4. This process can cause a locked path:
Example logging: info.magnolia.publishing.locking.PathLockManager : Lock for path [internalProfiles] requested by thread [Thread[http-nio-8080-exec-229,5,main]] with id [781].

so when publishers publish another node may encounter a locking issue.

Expected results
1. The publishing should be done quickly.
2. or the orderSiblings can be configured enable/disable per workspace

Actual results
1. Take minutes to finish publishing 1 node
2. If we try publish another node, we may encounter a locking issue

Development notes:
1. I enable debug mode for publishing process on public
2. Current customer's internalProfile workspace has ~13k nodes.
3. publish ~3-10 nodes:
4. I got 279k debug message saying
receiver.operation.jcr.AbstractJcrReceiveOperation: Ordering S5YRAK_ before S5ZZ24_

2nd try:
1. Current customer's internalProfile workspace has ~13k nodes.
2. I publish 1 node, It took Mar 8, 3:47 am – Mar 8, 3:53 am, => ~6 minutes to finish ordering
3.
4. Exported logging: extract-2022-03-08T05_04_52.914Z.csv



 Comments   
Comment by Roman Kovařík [ 21/Mar/22 ]

Discovery:

Document known limitations for big flat structures in JCR trees

https://jira.magnolia-cms.com/browse/SUPPORT-1047?focusedCommentId=35427&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35427

Comment by Pierre Sandrin [ 21/Mar/22 ]

Hi @Roman

We run into a similar Problem where we have a Folder containing about 1500+ nodes. When publishing the folder it takes a long time and the CPU usages is rising. After 10 - 15 min. the public instances are not reachable anymore and the publication process is aborted. (DX-Core with 2 Publics)

You mentioned limitations of JCR for big flat structures. Could you describe those limitations shortly, since I have no access to that support issue you linked to.

Now, we cannot turn off ordering of sibling for all workspaces since it is crucial for the pages workspace.

Would it be a solution generate subdirectories where we put nodes starting with A, B, C, ... It would reduce the ampout of nodes top a few hundreds.

Thanks a lot!

Comment by Roman Kovařík [ 22/Mar/22 ]

Hi Pierre

You mentioned limitations of JCR for big flat structures. Could you describe those limitations shortly, since I have no access to that support issue you linked to.

Here is a copy:

when you actually use the Jackrabbit implementation of JCR there come also another aspects into play.
One of the main benefits of JCR is hierarchical data structure. It has been found that the more nodes you store flat the less performance you can experience. I mean in case you have a parent news page with a big amount of children at the same level. Jackrabbit limits that for up to ~10k child nodes per node but up to our experience even few hundreds of child nodes at the same level play really negative role in the performance (it's not a linear rate but exponential then) - you have to render really a lot of nodes within the tree, each needs to be resolved so it takes time. Would this be your case? Would you mind to distribute your news to be more "hierarchical" (i.e. collect them per month or so).
The second thing that can negatively affect your performance is many references to a single node but this is apparently not your case.
Please follow to the official JR performance page for details about both.

 

Would it be a solution generate subdirectories where we put nodes starting with A, B, C, ... It would reduce the ampout of nodes top a few hundreds.

That would definitively help, it's worth trying. If that won't be enough, we can consider this suggestion: 

2. or the orderSiblings can be configured enable/disable per workspace

 

I hope that helps

Roman

Comment by Roman Kovařík [ 03/May/23 ]

For the record, for apps without a workflow, most of the ordering can be avoided utilising https://docs.magnolia-cms.com/product-docs/6.2/Modules/List-of-modules/Publishing-module.html#_configuring_itemsperrequest. But again, only in case of a tree structure respecting the Jackrabbit recommendations.

 

Comment by Dominik Maslanka [ 14/Sep/23 ]

This issue is solved by introducing a fast/full ordering configuration that will support ordering on public instance: https://jira.magnolia-cms.com/browse/PUBLISHING-254 

Generated at Mon Feb 12 10:35:44 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.