[MGNLPER-168] JcrSearchResultSupplier generates inefficient queries Created: 14/Mar/22  Updated: 23/May/23  Resolved: 10/May/23

Status: Closed
Project: Periscope
Component/s: None
Affects Version/s: 1.2.4
Fix Version/s: 1.2.8

Type: Bug Priority: High
Reporter: Rico Jansen Assignee: ricardo gonzalez
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: 0d
Time Spent: 0.5h
Original Estimate: Not Specified

Attachments: Java Source File EnhancedJcrSearchResultSupplier.java     Java Source File JcrSearchResultSupplier.java    
Issue Links:
Cloners
is cloned by MGNLPER-192 Port to master - JcrSearchResultSuppl... Closed
is cloned by MGNLPER-195 Recheck response time for MGNLPER-168 Closed
Problem/Incident
Relates
relates to ADMINCTR-226 Make Findbar search starting with an ... Closed
relates to MGNLPER-191 Investigate the query time results fo... Closed
relation
is related to MGNLPER-165 Search queries are not executed concu... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[X]* Doc/release notes changes? Comment present?
[X]* Downstream builds green?
[X]* Solution information and context easily available?
[X]* Tests
[X]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[X]* Affected version filled
Release notes required:
Yes
Date of First Response:
Epic Link: [JCR] Search Performance issues
Story Points: 5
Team: AuthorX
Work Started:
Approved:
Yes

 Description   

When generating a query JcrSearchResultSupplier will generate empty wildcard like as a clause when the searchterm is empty. This will create a bad performing query.

It also uses wildcard characters on a fulltext query, which is unnecessary.

Measure and record the times before the change and after the change in this ticket using the big data environment. We have a common place for the metrics here: https://www.notion.so/magnoliacms/Monitoring-and-tracking-metrics-d6c3037f7a584aa79911ecaf3d3bd22f



 Comments   
Comment by Rico Jansen [ 01/Apr/22 ]

Attached our version with the performance fixes.

Comment by Thomas Comiotto [ 22/Sep/22 ]

Find attached our improved version and feel free to apply the relevant changes to  JcrSearchResultSupplier:

  • Empty search strings return empty List (Line 113)
  • Direct selection of target node type instead of NT_BASE if possible (most cases) (Line 160, 240).
  • Use CONTAINS instead of LIKE (Line 143, 172)

The last point is somewhat counter-intuitive because the only way to generate a CONTAINS clause with QOMFactory is the fullTextSearch() method. Maybe that's one of the reasons why Adobe has it's own JCR-Query builder, and Magnolia would probably also greatly benefit from something better than the standard QOM classes.

In our case – we have a Magnolia installation with around 200k pages and 500k assets –  the above changes boosted performance by 500x (!!), most of all because CONTAINS is covered by the lucene indexer, and LIKE is not.

Best wishes,
Thomas

 

Comment by Lam Nguyen Bao [ 26/Sep/22 ]

Hi all,

It's true about improvements in customer implementations. It's similar to what I found in MGNLDAM-1016 

Like command is for query LOCALNAME which related to node name query so we can apply same tactical as MGNLDAM-1016 to have it as optional and people can decide by themself.

I'm also seeing that the wild cards are removed in customer code. It's important for FTS, since it's required full phrase of text to look for (e.g I'm looking for someone named "Kerschbaum"  but I can't remember it correctly, so I can start with "Kers" otherwise  "Kerschbaum" have to be entered). Again the configurable as in MGNLDAM-1016 is a good choice

BTW, replacement of QOMFactorywould will be a big improvement. We can take it into account and see if we get benefit from that, as I understand it's something like wrapper builder and helps for building query easier 
cc: tcomiotto 

Comment by Thomas Comiotto [ 26/Sep/22 ]

I'm also seeing that the wild cards are removed in customer code. It's important for FTS, since it's required full phrase of text to look for (e.g I'm looking for someone named "Kerschbaum"  but I can't remember it correctly, so I can start with "Kers" otherwise  "Kerschbaum" have to be entered).

 

As its name says, CONTAINS does a wildcard query by default.

Comment by Alberto Soto [ 21/Mar/23 ]

This issue is partially solved using ADMINCTR-226. An additional analysis has been check with the current code proposal from the client but it seems that the only potential change to introduce is avoiding empty querys or results, and avoiding time with null values as suggested.

Will add it to develop features to check in detail the performance of the changes, although only minor changes are detected, we will measure it and modify accordingly.

Comment by ricardo gonzalez [ 16/May/23 ]

In order to improve the search performance the following refinements were introduced:

  • Preventing from empty string search by trimming and checking the entry string is not null
  • Removing the wildcard character while performing fullText search
  • Selection of target nodes instead of general set, then reducing the size in which the search is about to be performed
  • Implementing a dedicated strategy for the assets

After these changes were implemented the search response time has been reduced. There are also some on-running investigations in order to keep improving the search efficiency, those are related to the way of indexing, caching and other issues.

Generated at Mon Feb 12 10:29:24 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.