[MAGNOLIA-6245] Misplaced highlight tags in search excerpt Created: 04/Jun/15  Updated: 16/Feb/16  Resolved: 04/Jun/15

Status: Closed
Project: Magnolia
Component/s: core
Affects Version/s: 5.4
Fix Version/s: 5.4

Type: Bug Priority: Major
Reporter: Federico Grilli Assignee: Federico Grilli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File misplaced-highliting.png     PNG File search-qa-20150607.png    
Issue Links:
causality
caused by MAGNOLIA-6189 Provide highlighting option in search... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:

 Description   

Our info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt is messing up the position of the highlighting term ( a strong tag ) as illustrated by the attached screenshot. This is probably due to the text clean up we do before passing it to the NoXMLEscapeHighlighter.doHighlight method



 Comments   
Comment by Mikaël Geljić [ 07/Jun/15 ]

Is this really working at all? Not sure why I keep seeing UUIDs and no highlight (see attached screenshots):

Tried with/without this ticket's changes, tried with dev instance, platform's tomcat bundle (on SNAPSHOTS) or CMS's tomcat bundle, all the same. :/

Comment by Federico Grilli [ 07/Jun/15 ]

It is as if the bundle you're testing doesn't have the proper configuration which is weird cause it was committed to master http://git.magnolia-cms.com/gitweb/?p=ce-bundle.pub.git;a=commit;h=9d3551674051ef3e0147981871db3091da30069a . I actually had the same issue a couple of days ago but thought it was due an outdated webapp in my local environment so I added the configuration manually to website and tours workspaces.

Comment by Mikaël Geljić [ 08/Jun/15 ]

Yeah, I'm not sure what's going wrong. I also copied the bundle repo-conf updates to my overlaid webapp, reinstalled, out of luck
What do you mean by "adding the configuration manually"? Anywhere I can check for proper setup?
Cheers,

Comment by Mikaël Geljić [ 15/Jun/15 ]

Works nicely, was todo with my webapp setup indeed

Comment by Mathias Conradt [ 15/Feb/16 ]

I am having this issue on 5.4.3 CE at my hosting provider. I didn't have it on 5.4.1 CE though. I had reported it on StackOverflow here: http://stackoverflow.com/questions/35392243/magnolia-cms-search-result-item-excerpt-differs-in-quality-between-5-4-1-and-5 including a screenshot.

Weird, cause this ticket has already been closed in June 2015, which is after the 5.4.3 got released, not?

I checked the repo-config of my 5.4.3:

The lines

<!-- needed to highlight the searched term -->
<param name="supportHighlighting" value="true"/>
<!-- custom provider for getting an HTML excerpt in a query result with rep:excerpt() -->
<param name="excerptProviderClass" value="info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt"/>

were already in the jackrabbit-memory-search.xml, but not in the jackrabbit-bundle-mysql-search.xml, so I added them there as well, restarted. No difference though. Do I need to run a re-index somehow? I already tried to edit pages afterwards in order to make the system re-index them. No success though.

@mgeljic Can you point me to what you did in order to fix it? Thanks!

@fgrilli What do you mean by "I added the configuration manually to website and tours workspaces." ? Do you mean the configuration with Magnolia AdminCentral? In the default installation, I did not find any config about hightlighting or excerptProviderClass; I only see relevant configurations in the repo-config/xml files.

== Update ==
I installed a new 5.4.3 locally, the issue does not occur there! But it's the same version as it is at my hosting provider, where I am having the issue. Only difference is that I use Derby locally, but that should not matter I think?!
Anyway, I did a folder diff of both versions of my repo-conf folder (after I had applied above code snippet to jackrabbit-bundle-mysql-search.xml) and there are no differences in either jackrabbit-memory-search.xml nor jackrabbit-bundle-mysql-search.xml. So I am not sure where else to look.

Comment by Mikaël Geljić [ 16/Feb/16 ]

Hi Mathias,

I cannot fully remember how I solved it, besides reimporting my dev project and overlaying the webapp once again... but that is dev-environment specific. In your case, you're talking about a hosted environment. Maybe worth checking what type of db it is using (e.g. through the about-app), there are actually 5 jackrabbit config files for different environments:

  • jackrabbit-bundle-derby-search.xml
  • jackrabbit-bundle-ingres-search.xml
  • jackrabbit-bundle-mysql-search.xml
  • jackrabbit-bundle-postgres-search.xml
  • jackrabbit-memory-search.xml

Could it be that it uses another one which doesn't have the excerpt config?

Cheers,

Comment by Mathias Conradt [ 16/Feb/16 ]

Hi Mikaël,

thanks for your reply. The hosted environment is using MySql, locally I am using Derby. I checked all config files, they are the same. In fact, I diffed the entire repo-conf folder, and there is no difference (after I added the lines yesterday):

<!-- needed to highlight the searched term -->
<param name="supportHighlighting" value="true"/>
<!-- custom provider for getting an HTML excerpt in a query result with rep:excerpt() -->
<param name="excerptProviderClass" value="info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt"/>

to the hosted environment. What I found strange though is that on the hosted environment, the all other configurations (derby, memory-search) initially already had those lines included, just not the jackrabbit-bundle-mysql-search.xml, where I had to add it). Maybe I should check with the hosting provider (it's one of the listed Magnolia hosting partners on the Magnolia website) why that line was particularly missing.

All configs are the same now though after I added the lines to jackrabbit-bundle-mysql-search.xml. I am wondering though if the issue is with adding those lines afterwards and whether there is some additional step that needs to be done, such as reindexing the internal Solr server.

  • Mathias
Comment by Mikaël Geljić [ 16/Feb/16 ]

You've also restarted the server, haven't you? Those repo-conf xml files are not picked up on the fly, they're just read upon startup.

Comment by Mathias Conradt [ 16/Feb/16 ]

Yes, multiple times already. I also edited the content pages (that I am searching for) again, hoping they get re-indexed then. But still the same...
Sample query to my dev website: http://bit.ly/1WoSkmf
You see, it's the same as in your second screenshot that you posted.
(I sent an email to the hosting provider, maybe they have an idea, and also why above lines were left out in the mysql-config in the first place.)

Comment by Federico Grilli [ 16/Feb/16 ]

Hi Mathias, indeed the fix should be in 5.4.3 too. In fact it's there since 5.4 and I see no relevant changes in file history which might hint at a regression. Also related tests pass. In my comment I was referring to the workspace.xml files created for each workspace typically under ${magnolia.repositories.home}/magnolia/workspaces/<workspace_name>/.

Comment by Mathias Conradt [ 16/Feb/16 ]

Hi Frederico, thanks! That's a very helpful hint. So far I had only looked at the repo-conf.
I just checked: I have the lines in my local 5.4.3 version (which works fine) of website/workspace.xml, but they also seem to be missing on the hosting provider's 5.4.3 instance. I will add them there and give it another try (when I get the chance to restart the Tomcat later tonight).

Comment by Mathias Conradt [ 16/Feb/16 ]

Hi Frederico, after I added the lines to workspaces/website/workspace.xml and changed/re-indexed the content pages, it's working fine now. Thanks for the hint.

Comment by Federico Grilli [ 16/Feb/16 ]

Np, glad it helped

Generated at Mon Feb 12 04:12:41 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.