[MGNLUI-7740] Null check for non-existing property could improve performance Created: 18/Jan/23  Updated: 14/Feb/23  Resolved: 14/Feb/23

Status: Closed
Project: Magnolia UI
Component/s: None
Affects Version/s: 6.2.27
Fix Version/s: 6.3.0, 6.2.29

Type: Improvement Priority: Neutral
Reporter: Antonín Juran Assignee: Adam Siska
Resolution: Done Votes: 0
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: 0.75d Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: PNG File image-2023-02-07-14-38-14-324.png     PNG File image-2023-02-07-14-38-27-225.png    
Issue Links:
Issue split
split from MAGNOLIA-8347 JcrQueryBuilder performance issues. Closed
Problem/Incident
causality
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MGNLUI-7817 Implementation Sub-task Completed Adam Siska  
MGNLUI-7818 Review Sub-task Closed Antonín Juran  
MGNLUI-7819 Pre-int QA Sub-task Closed Antonín Juran  
MGNLUI-7820 QA Sub-task Completed Sang Ngo Huu  
Template:
Acceptance criteria:
Empty
Task DoD:
[X]* Doc/release notes changes? Comment present?
[X]* Downstream builds green?
[X]* Solution information and context easily available?
[X]* Tests
[X]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:
Epic Link: [JCR] Search Performance issues
Sprint: Nucleus 29
Story Points: 2
Team: Nucleus
Work Started:

 Description   

Steps to reproduce

  1. Go to https://nightly.magnolia-cms.com/.magnolia/admincentral#app:pages-app:browser
  2. Filter via page template dropdown
  3. Inspect/debug the constructed JCR query performance/timing

Original ticket description

A curious one (which we fully blame on JackRabbit) was the the jcrName wildcard query (ie '%nodename%). A comparison with the queries generated by Magnolia 5 revealed that adding a property existence query increased performance significantly. So the difference between one without and one without on our repository is really something:
SELECT * FROM [nt:base] AS t WHERE ISDESCENDANTNODE(t, [/]) AND (t.[jcr:primaryType] = 'mgnl:page' OR t.[jcr:primaryType] = 'mgnl:variants' OR t.[jcr:primaryType] = 'mgnl:variant') AND (LOWER(LOCALNAME(t)) LIKE '%test%' )
546 nodes returned in 19210ms
SELECT * FROM [nt:base] AS t WHERE ISDESCENDANTNODE(t, [/]) AND (t.[jcr:primaryType] = 'mgnl:page' OR t.[jcr:primaryType] = 'mgnl:variants' OR t.[jcr:primaryType] = 'mgnl:variant') AND (LOWER(LOCALNAME(t)) LIKE '%test%' OR t.['xxxx'] is not null)
546 nodes returned in 426ms

Results from discovery:

SELECT * FROM [nt:base] AS t WHERE ISDESCENDANTNODE(t, [/]) AND (t.[jcr:primaryType] = 'mgnl:page') AND LOWER(LOCALNAME(t)) LIKE '%test%'
14572 nodes returned in 6486ms
SELECT * FROM [nt:base] AS t WHERE ISDESCENDANTNODE(t, [/]) AND (t.[jcr:primaryType] = 'mgnl:page') AND (LOWER(LOCALNAME(t)) LIKE '%test%' OR t.['xxxx'] is not null)
14572 nodes returned in 798m


 Comments   
Comment by Roman Kovařík [ 18/Jan/23 ]

Discovery:

"Strange null check" in query from the ticket description (OR t.['xxxx'] is not null) improves performance in some cases:

SELECT * FROM [nt:base] AS t WHERE ISDESCENDANTNODE(t, [/]) AND (t.[jcr:primaryType] = 'mgnl:page') AND LOWER(LOCALNAME(t)) LIKE '%test%'
14572 nodes returned in 6486ms

SELECT * FROM [nt:base] AS t WHERE ISDESCENDANTNODE(t, [/]) AND (t.[jcr:primaryType] = 'mgnl:page') AND (LOWER(LOCALNAME(t)) LIKE '%test%' OR t.['xxxx'] is not null)
14572 nodes returned in 798ms

but in others not:

SELECT * FROM [nt:base] AS t WHERE ISDESCENDANTNODE(t, [/]) AND t.[jcr:primaryType] = 'mgnl:page' AND LOWER(t.[mgnl:template]) LIKE '%mtk2:pages/basic%' AND LOWER(t.title) LIKE '%test%' AND LOWER(LOCALNAME(t)) LIKE '%test%'
14384 nodes returned in 1805ms

SELECT * FROM [nt:base] AS t WHERE ISDESCENDANTNODE(t, [/]) AND t.[jcr:primaryType] = 'mgnl:page' AND LOWER(t.[mgnl:template]) LIKE '%mtk2:pages/basic%' AND LOWER(t.title) LIKE '%test%' AND (LOWER(LOCALNAME(t)) LIKE '%test%' OR t.['xxxx'] is not null)
14384 nodes returned in 2395ms

Comment by Adam Siska [ 07/Feb/23 ]

During my investigation I've had completely different numbers here. On living instance (demo.magnolia-cms.com with travel page multiply duplicated into hierarchy with approx 4500+ nodes) the (repeatable) results were as could be seen here:

without null check:

with null check:

In RepositoryTests environment on different size data sets (always three levels of nodes) the results were like this:

Without additional null check:
Average running time (for 20 runs on 1000 nodes): 22.45
Average running time (for 20 runs on 8000 nodes): 104.2
Average running time (for 20 runs on 15625 nodes): 278.0
Average running time (for 20 runs on 27000 nodes): 587.35
-----
With additional null check:
Average running time (for 20 runs on 1000 nodes): 92.7
Average running time (for 20 runs on 8000 nodes): 417.1
Average running time (for 20 runs on 15625 nodes): 1009.1
Average running time (for 20 runs on 27000 nodes): 3922.65
-----
Slow-down ratios:
92.7/22.45 = 4.12917594654788418708
417.1/104.2 = 4.00287907869481765834
1009.1/278.0 = 3.62985611510791366906
3922.65/587.35 = 6.67855622712181833659 
Comment by Adam Siska [ 08/Feb/23 ]

Another investigation/numbers on PR: https://git.magnolia-cms.com/projects/PLATFORM/repos/ui/pull-requests/2309/overview?commentId=109516

Generated at Mon Feb 12 09:48:55 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.