[MGNLPER-168] JcrSearchResultSupplier generates inefficient queries Created: 14/Mar/22 Updated: 23/May/23 Resolved: 10/May/23 |
|
| Status: | Closed |
| Project: | Periscope |
| Component/s: | None |
| Affects Version/s: | 1.2.4 |
| Fix Version/s: | 1.2.8 |
| Type: | Bug | Priority: | High |
| Reporter: | Rico Jansen | Assignee: | ricardo gonzalez |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | 0d | ||
| Time Spent: | 0.5h | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Template: |
|
||||||||||||||||||||||||||||||||||||
| Acceptance criteria: |
Empty
|
||||||||||||||||||||||||||||||||||||
| Task DoD: |
[X]*
Doc/release notes changes? Comment present?
[X]*
Downstream builds green?
[X]*
Solution information and context easily available?
[X]*
Tests
[X]*
FixVersion filled and not yet released
[ ] 
Architecture Decision Record (ADR)
|
||||||||||||||||||||||||||||||||||||
| Bug DoR: |
[ ]*
Steps to reproduce, expected, and actual results filled
[X]*
Affected version filled
|
||||||||||||||||||||||||||||||||||||
| Release notes required: |
Yes
|
||||||||||||||||||||||||||||||||||||
| Date of First Response: | |||||||||||||||||||||||||||||||||||||
| Epic Link: | [JCR] Search Performance issues | ||||||||||||||||||||||||||||||||||||
| Story Points: | 5 | ||||||||||||||||||||||||||||||||||||
| Team: | |||||||||||||||||||||||||||||||||||||
| Work Started: | |||||||||||||||||||||||||||||||||||||
| Approved: |
Yes
|
||||||||||||||||||||||||||||||||||||
| Description |
|
When generating a query JcrSearchResultSupplier will generate empty wildcard like as a clause when the searchterm is empty. This will create a bad performing query. It also uses wildcard characters on a fulltext query, which is unnecessary. Measure and record the times before the change and after the change in this ticket using the big data environment. We have a common place for the metrics here: https://www.notion.so/magnoliacms/Monitoring-and-tracking-metrics-d6c3037f7a584aa79911ecaf3d3bd22f |
| Comments |
| Comment by Rico Jansen [ 01/Apr/22 ] |
|
Attached our version with the performance fixes. |
| Comment by Thomas Comiotto [ 22/Sep/22 ] |
|
Find attached our improved version and feel free to apply the relevant changes to JcrSearchResultSupplier:
The last point is somewhat counter-intuitive because the only way to generate a CONTAINS clause with QOMFactory is the fullTextSearch() method. Maybe that's one of the reasons why Adobe has it's own JCR-Query builder, and Magnolia would probably also greatly benefit from something better than the standard QOM classes. In our case – we have a Magnolia installation with around 200k pages and 500k assets – the above changes boosted performance by 500x (!!), most of all because CONTAINS is covered by the lucene indexer, and LIKE is not. Best wishes,
|
| Comment by Lam Nguyen Bao [ 26/Sep/22 ] |
|
Hi all, It's true about improvements in customer implementations. It's similar to what I found in Like command is for query LOCALNAME which related to node name query so we can apply same tactical as I'm also seeing that the wild cards are removed in customer code. It's important for FTS, since it's required full phrase of text to look for (e.g I'm looking for someone named "Kerschbaum" but I can't remember it correctly, so I can start with "Kers" otherwise "Kerschbaum" have to be entered). Again the configurable as in BTW, replacement of QOMFactorywould will be a big improvement. We can take it into account and see if we get benefit from that, as I understand it's something like wrapper builder and helps for building query easier |
| Comment by Thomas Comiotto [ 26/Sep/22 ] |
As its name says, CONTAINS does a wildcard query by default. |
| Comment by Alberto Soto [ 21/Mar/23 ] |
|
This issue is partially solved using Will add it to develop features to check in detail the performance of the changes, although only minor changes are detected, we will measure it and modify accordingly. |
| Comment by ricardo gonzalez [ 16/May/23 ] |
|
In order to improve the search performance the following refinements were introduced:
After these changes were implemented the search response time has been reduced. There are also some on-running investigations in order to keep improving the search efficiency, those are related to the way of indexing, caching and other issues. |