[MAGNOLIA-7696] Uploaded PDF search is not possible for Magnolia 6.1.3 Created: 16/Dec/19  Updated: 02/Jan/20  Resolved: 02/Jan/20

Status: Closed
Project: Magnolia
Component/s: None
Affects Version/s: 6.1.3
Fix Version/s: None

Type: Bug Priority: Neutral
Reporter: Viet Nguyen Assignee: Federico Grilli
Resolution: Not an issue Votes: 0
Labels: regression, support
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
causality
relation
is related to MGNLDAM-442 Allow full-text search for common/pop... Closed
is related to MAGNOLIA-7231 Website specific rules for indexing a... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:
Sprint: UI Framework 12, UI Framework 13
Story Points: 3

 Description   

Uploaded PDF search is not possible for Magnolia 6.1.3. Steps to reproduce:
1. Upload a PDF file using Assets app
2. Wait a bit (for indexing)
3. Grab any excerpt in the file and search for it


Expected result:
File / Asset found for the uploaded one.


Current actual result:
Nothing shown up



 Comments   
Comment by Federico Grilli [ 19/Dec/19 ]

The issue can't be reproduced on any plain dx-core bundle. Libraries involved in the indexing process (e.g. Tika, commons-compress) seem to be fine.

The main lead currently points to a flawed custom JR repo configuration. This seems to be the case for Magnolia's own demo where the issue is actually reproducible.
It turns out the webapp there uses a custom JR repo configuration which reads

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
      <param name="path" value="${wsp.home}/index"/>
      <!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home -->
      <param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration.xml"/>
...

whereas it should be

<SearchIndex class="info.magnolia.jackrabbit.lucene.SearchIndex">
  <param name="path" value="${wsp.home}/index" />
  <!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home -->
  <param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration_${wsp.name}.xml"/>
...

This may cause dam indexing configuration (https://git.magnolia-cms.com/projects/MODULES/repos/dam/browse/magnolia-dam-jcr/src/main/resources/info/magnolia/jackrabbit/indexing_configuration_dam.xml) to be skipped altogether, hence no proper documents indexing.

Generated at Mon Feb 12 04:26:02 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.