[MAGNOLIA-6188] Provide index aggregate for mgnl:page in JR indexing configuration Created: 30/Apr/15 Updated: 15/Jul/20 Resolved: 03/Jun/15 |
|
| Status: | Closed |
| Project: | Magnolia |
| Component/s: | core |
| Affects Version/s: | None |
| Fix Version/s: | 5.4 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Federico Grilli | Assignee: | Federico Grilli |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Template: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Acceptance criteria: |
Empty
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Task DoD: |
[ ]*
Doc/release notes changes? Comment present?
[ ]*
Downstream builds green?
[ ]*
Solution information and context easily available?
[ ]*
Tests
[ ]*
FixVersion filled and not yet released
[ ] 
Architecture Decision Record (ADR)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Date of First Response: | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Contents in a page are found not only in the node whose primaryType is mgnl:page but also in its sub-nodes mgnl:area and mgnl:component. Adding the latter two to the JR index aggregate would allow for simpler, more accurate and possibly faster searches by simply doing e.g.
select * from [mgnl:page] as p where contains(p.*, 'foo')
rather than
select * from [nt:base] as p where contains(p.*, 'foo')
and then post-processing the results in Java by getting for each node the ancestor of type mgnl:page and finally discarding duplicates. |
| Comments |
| Comment by Magnolia International [ 05/May/15 ] |
|
Patch is good, but test needs work. Will commit some examples shortly that show how to use matchers to create less verbose and more useful tests. I would also be tempted to merge this and |
| Comment by Magnolia International [ 05/May/15 ] |
|
Actually, I'm not sure the patch is good after all – when stepping in with the debugger, I see that the line above (where we get rep:excerpt) yields <div><span>Articles dms 41828606-775b-4e60-999f-1cb30809d5bb Section: <strong>Article</strong> This is the banner text for the "<strong>Article</strong>" section. It is inherited by all child pages unless these are configured</span><span>above 1ffeccb6-bbaa-4d07-a89a-55cff9c18b71 Teaser to an <strong>Article</strong> dms 395a677e-5ed4-4cff-86c4-fd0122dd0c5e Another Teaser to Another <strong>Article</strong> d1b4b57d-3b5d-41ae-85f2-8ed2e4c186ea</span></div> It doesn't seem like "dms" and UUIDs should be in a search excerpt ? One more thing |
| Comment by Magnolia International [ 05/May/15 ] |
|
Please see commit 1f21fbe. I modified the previously existing testTitlePropertyIsBoosted() to show how to advantageously use matchers (hopefully I'm not too biased I also amended the new tests and make them fail (i.e by showing some flaws that I think should be fixed (uuids in excerpt) or simply by letting you complete the test |
| Comment by Federico Grilli [ 08/May/15 ] |
|
Pls use
|
| Comment by Federico Grilli [ 10/May/15 ] |
|
commit e17623a920e43 proposes an alternative solution not using indexing configuration for JCR identifiers. Basically it will strip them from the excerpt by default regardless of whether they're configured to be used or not in excerpts. One loses some flexibility but to me it looks like we're not losing much (I guess not many ppl want jcr ids in their search excerpts). This would also allow for an indexing config working sensibly out-of-the-box, rather than having to explain in docu that if you want to get rid of jcr ids you need to specify exactly the property names containing them in config. |
| Comment by Magnolia International [ 18/May/15 ] |
|
Review comments:
|
| Comment by Federico Grilli [ 19/May/15 ] |
|
Thanks for the review.
I actually added an excerpt class to the surrounding div. Agree on removing the span s. If I get it right you would like the excerpt string to be something like? <div class="excerpt">blah blah blah <span class="highlight">foo</span> blah blah...</div> This could be done but I guess a strong tag surrounding the highlighted term would be better, out of the box, cause you wouldn’t need to do anything to do the highlighting and you could still create your css class to modify the appearance of it.
You’re right, no need for that apparently
Will add a comment in the code. Basically the default implementation does XML escaping by encoding illegal XML characters. This is a source of troubles when we want to strip all those unwanted HTML entities and tags. In particular the double quote char (“ escaped as &quot;) which, if you try it removing it before passing the text onto the doHighlight(..) method (as we do for other illegal chars) will mess up the positioning of the highlighting tag, e.g. Ar<strong>ticle</strong> instead of <strong>Article</strong>.
Sure, there’s a linked docu issue for that
I’m afraid this won’t be possible - “This feature is Jackrabbit specific (introduced in version 1.3) and will not work with other JCR implementations.” http://wiki.apache.org/jackrabbit/ExcerptProvider |
| Comment by Magnolia International [ 02/Jun/15 ] |
|
All good; I agree with your span vs strong observation, good point. Let's keep the span (with a class), since the <div#excerpt> can contain several such <span> fragments. Re portability, it's clear this config and excerpt provider etc is Jackrabbit-only, what i meant is, ideally, we have pointers to how to do with other containers. IIRC I had an open issue on the Modeshape issue tracker about it (edit: found it - it also points to ExO's implementation of the same, so that might be useful for docu) |
| Comment by Magnolia International [ 03/Jun/15 ] |
|
QueryUtilTest is broken |
| Comment by Magnolia International [ 03/Jun/15 ] |
|
Fixed and reworked QueryUtilTest |