[MGNLDAM-442] Allow full-text search for common/popular file types Created: 13/May/14  Updated: 16/Dec/19  Resolved: 17/Jul/14

Status: Closed
Project: Magnolia DAM Module
Component/s: None
Affects Version/s: None
Fix Version/s: 1.2.6

Type: Task Priority: Major
Reporter: Zdenek Skodik Assignee: Federico Grilli
Resolution: Fixed Votes: 0
Labels: support
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
dependency
depends upon MGNLUI-3069 Upgrade to commons-compress 1.8.1(bac... Closed
relation
is related to MAGNOLIA-7123 Full text search in documents (pdf, d... Closed
is related to MAGNOLIA-7398 Full text search in documents (pdf, ... Closed
is related to MAGNOLIA-7696 Uploaded PDF search is not possible f... Closed
is related to MGNLDAM-488 Provide unit test to ensure search in... Closed
Template:
Acceptance criteria:
Empty
Task DoR:
Empty
Release notes required:
Yes
Date of First Response:

 Description   

File types like pdf, txt or doc are excluded from queries like select * from nt:base where contains(*, 'search_string') since we've upgraded to Tika 1.2+. Only modern file types like docx return some results.



 Comments   
Comment by Jaroslav Simak [ 17/Jun/14 ]

I've done some investigation and it showed that it has nothing to do with tika library itself, because reading all files by code works w/o any problem with tika 1.2+. Looks like it is somehow related to the jackrabbit (found some threads where ppl are complaining that full text search doesn't work with jackrabbit 2.6.x).
Search on latest 5.3 bundle works fine.

Comment by Federico Grilli [ 14/Jul/14 ]

As mentioned in the comment section and in the linked support issue, the problem seems related to JackRabbit 2.6.x which is bundled with Magnolia 5.2.x. As of Magnolia 5.3, JR version has been updated to 2.8 and the problem has gone. Since we cannot update JR to a major version in a Magnolia maintenance release, the alternative is to update to Magnolia 5.3.

Comment by Federico Grilli [ 15/Jul/14 ]

I tried to replace JR 2.6.x with 2.8.0 on a 5.2.5 instance and the problem is still there. Below is the output of the diff between the artefacts found in WEB/libs for 5.2.5 (left-hand side) and 5.3.0 (right-hand side) apart from the obvious differences in magnolia modules version and JR version no library involved in search, afaik, has changed, e.g. tika and lucene. This leaves me doubtful as to why full text search in this case works in 5.3 but not in 5.2.x.

WEB-INF/lib/FastInfoset-1.2.12.jar				WEB-INF/lib/FastInfoset-1.2.12.jar
WEB-INF/lib/aceeditor-0.8.5.jar					WEB-INF/lib/aceeditor-0.8.5.jar
WEB-INF/lib/activation-1.1.jar					WEB-INF/lib/activation-1.1.jar
WEB-INF/lib/aopalliance-1.0.jar					WEB-INF/lib/aopalliance-1.0.jar
							      >	WEB-INF/lib/asm-3.1.jar
WEB-INF/lib/aspectjrt-1.6.11.jar				WEB-INF/lib/aspectjrt-1.6.11.jar
WEB-INF/lib/backport-util-concurrent-3.1.jar			WEB-INF/lib/backport-util-concurrent-3.1.jar
WEB-INF/lib/bcmail-jdk16-1.46.jar				WEB-INF/lib/bcmail-jdk16-1.46.jar
WEB-INF/lib/bcpg-jdk16-1.46.jar					WEB-INF/lib/bcpg-jdk16-1.46.jar
WEB-INF/lib/bcprov-jdk16-1.46.jar				WEB-INF/lib/bcprov-jdk16-1.46.jar
							      >	WEB-INF/lib/cglib-2.2.jar
WEB-INF/lib/cglib-nodep-2.1_3.jar				WEB-INF/lib/cglib-nodep-2.1_3.jar
WEB-INF/lib/ckeditor-wrapper-for-vaadin-7.8.8.jar		WEB-INF/lib/ckeditor-wrapper-for-vaadin-7.8.8.jar
WEB-INF/lib/cloning-1.8.2.jar					WEB-INF/lib/cloning-1.8.2.jar
WEB-INF/lib/commons-beanutils-1.8.3.jar				WEB-INF/lib/commons-beanutils-1.8.3.jar
WEB-INF/lib/commons-betwixt-0.8.jar				WEB-INF/lib/commons-betwixt-0.8.jar
WEB-INF/lib/commons-cli-1.2.jar					WEB-INF/lib/commons-cli-1.2.jar
WEB-INF/lib/commons-codec-1.4.jar				WEB-INF/lib/commons-codec-1.4.jar
WEB-INF/lib/commons-collections-3.2.1.jar			WEB-INF/lib/commons-collections-3.2.1.jar
WEB-INF/lib/commons-compress-1.0.jar			      |	WEB-INF/lib/commons-compress-1.8.1.jar
WEB-INF/lib/commons-dbcp-1.4.jar				WEB-INF/lib/commons-dbcp-1.4.jar
WEB-INF/lib/commons-digester-1.8.1.jar				WEB-INF/lib/commons-digester-1.8.1.jar
WEB-INF/lib/commons-fileupload-1.2.1.jar			WEB-INF/lib/commons-fileupload-1.2.1.jar
WEB-INF/lib/commons-httpclient-3.1.jar				WEB-INF/lib/commons-httpclient-3.1.jar
WEB-INF/lib/commons-io-1.4.jar				      |	WEB-INF/lib/commons-io-2.4.jar
WEB-INF/lib/commons-jexl-2.1.1.jar				WEB-INF/lib/commons-jexl-2.1.1.jar
WEB-INF/lib/commons-lang-2.4.jar				WEB-INF/lib/commons-lang-2.4.jar
							      >	WEB-INF/lib/commons-lang3-3.1.jar
WEB-INF/lib/commons-pool-1.4.jar				WEB-INF/lib/commons-pool-1.4.jar
WEB-INF/lib/commons-proxy-1.0.jar				WEB-INF/lib/commons-proxy-1.0.jar
WEB-INF/lib/concurrent-1.3.4.jar				WEB-INF/lib/concurrent-1.3.4.jar
WEB-INF/lib/cos-05Nov2002.jar					WEB-INF/lib/cos-05Nov2002.jar
WEB-INF/lib/cssinject-2.0.3.jar					WEB-INF/lib/cssinject-2.0.3.jar
WEB-INF/lib/cssparser-0.9.5.jar					WEB-INF/lib/cssparser-0.9.5.jar
WEB-INF/lib/customfield-1.0.0.jar			      <
WEB-INF/lib/derby-10.5.3.0_1.jar				WEB-INF/lib/derby-10.5.3.0_1.jar
WEB-INF/lib/dom4j-1.6.1.jar					WEB-INF/lib/dom4j-1.6.1.jar
WEB-INF/lib/easyuploads-7.0.0.jar				WEB-INF/lib/easyuploads-7.0.0.jar
WEB-INF/lib/ehcache-1.5.0.jar					WEB-INF/lib/ehcache-1.5.0.jar
WEB-INF/lib/ezmorph-1.0.6.jar					WEB-INF/lib/ezmorph-1.0.6.jar
WEB-INF/lib/filters-2.0.235.jar					WEB-INF/lib/filters-2.0.235.jar
WEB-INF/lib/fontbox-1.8.1.jar					WEB-INF/lib/fontbox-1.8.1.jar
WEB-INF/lib/freemarker-2.3.18.jar				WEB-INF/lib/freemarker-2.3.18.jar
WEB-INF/lib/geronimo-stax-api_1.0_spec-1.0.1.jar		WEB-INF/lib/geronimo-stax-api_1.0_spec-1.0.1.jar
WEB-INF/lib/groovy-all-2.1.8.jar			      |	WEB-INF/lib/groovy-all-2.2.2.jar
WEB-INF/lib/gson-2.2.2.jar					WEB-INF/lib/gson-2.2.2.jar
WEB-INF/lib/guava-10.0.1.jar				      |	WEB-INF/lib/guava-16.0.1.jar
WEB-INF/lib/guice-3.0.jar					WEB-INF/lib/guice-3.0.jar
WEB-INF/lib/gwt-graphics-1.0.0.jar				WEB-INF/lib/gwt-graphics-1.0.0.jar
WEB-INF/lib/httpclient-4.2.1.jar				WEB-INF/lib/httpclient-4.2.1.jar
WEB-INF/lib/httpcore-4.2.1.jar					WEB-INF/lib/httpcore-4.2.1.jar
WEB-INF/lib/imagefilter-0.5.3.jar			      <
WEB-INF/lib/imageinfo-1.7.jar					WEB-INF/lib/imageinfo-1.7.jar
WEB-INF/lib/isoparser-1.0-RC-1.jar				WEB-INF/lib/isoparser-1.0-RC-1.jar
WEB-INF/lib/istack-commons-runtime-2.16.jar			WEB-INF/lib/istack-commons-runtime-2.16.jar
WEB-INF/lib/ivy-2.1.0.jar					WEB-INF/lib/ivy-2.1.0.jar
WEB-INF/lib/jackrabbit-api-2.6.4.jar			      |	WEB-INF/lib/jackrabbit-api-2.8.0.jar
WEB-INF/lib/jackrabbit-core-2.6.4.jar			      |	WEB-INF/lib/jackrabbit-core-2.8.0.jar
WEB-INF/lib/jackrabbit-jcr-commons-2.6.4.jar		      |	WEB-INF/lib/jackrabbit-data-2.8.0-tests.jar
WEB-INF/lib/jackrabbit-spi-2.6.4.jar			      |	WEB-INF/lib/jackrabbit-data-2.8.0.jar
WEB-INF/lib/jackrabbit-spi-commons-2.6.4.jar		      |	WEB-INF/lib/jackrabbit-jcr-commons-2.8.0.jar
							      >	WEB-INF/lib/jackrabbit-ocm-2.0.0.jar
							      >	WEB-INF/lib/jackrabbit-spi-2.8.0.jar
							      >	WEB-INF/lib/jackrabbit-spi-commons-2.8.0.jar
WEB-INF/lib/jackson-core-asl-1.9.12.jar				WEB-INF/lib/jackson-core-asl-1.9.12.jar
WEB-INF/lib/jackson-jaxrs-1.9.12.jar				WEB-INF/lib/jackson-jaxrs-1.9.12.jar
WEB-INF/lib/jackson-mapper-asl-1.9.12.jar			WEB-INF/lib/jackson-mapper-asl-1.9.12.jar
WEB-INF/lib/jackson-xc-1.9.12.jar				WEB-INF/lib/jackson-xc-1.9.12.jar
WEB-INF/lib/javassist-3.16.1-GA.jar				WEB-INF/lib/javassist-3.16.1-GA.jar
WEB-INF/lib/javax.inject-1.jar					WEB-INF/lib/javax.inject-1.jar
WEB-INF/lib/jaxb-api-2.2.jar					WEB-INF/lib/jaxb-api-2.2.jar
WEB-INF/lib/jaxb-core-2.2.7.jar					WEB-INF/lib/jaxb-core-2.2.7.jar
WEB-INF/lib/jaxb-impl-2.2.7.jar					WEB-INF/lib/jaxb-impl-2.2.7.jar
WEB-INF/lib/jaxen-1.1.1.jar					WEB-INF/lib/jaxen-1.1.1.jar
WEB-INF/lib/jaxrs-api-3.0.4.Final.jar				WEB-INF/lib/jaxrs-api-3.0.4.Final.jar
WEB-INF/lib/jbcrypt-0.3m.jar					WEB-INF/lib/jbcrypt-0.3m.jar
WEB-INF/lib/jboss-annotations-api_1.1_spec-1.0.1.Final.jar	WEB-INF/lib/jboss-annotations-api_1.1_spec-1.0.1.Final.jar
WEB-INF/lib/jcip-annotations-1.0.jar				WEB-INF/lib/jcip-annotations-1.0.jar
WEB-INF/lib/jcl-over-slf4j-1.7.5.jar				WEB-INF/lib/jcl-over-slf4j-1.7.5.jar
WEB-INF/lib/jcr-2.0.jar						WEB-INF/lib/jcr-2.0.jar
WEB-INF/lib/jdom-1.1.jar					WEB-INF/lib/jdom-1.1.jar
WEB-INF/lib/jempbox-1.8.1.jar					WEB-INF/lib/jempbox-1.8.1.jar
WEB-INF/lib/json-lib-2.3-jdk15.jar				WEB-INF/lib/json-lib-2.3-jdk15.jar
WEB-INF/lib/jsoup-1.7.2.jar					WEB-INF/lib/jsoup-1.7.2.jar
WEB-INF/lib/jsr107cache-1.0.jar					WEB-INF/lib/jsr107cache-1.0.jar
WEB-INF/lib/jsr173_api-1.0.jar					WEB-INF/lib/jsr173_api-1.0.jar
WEB-INF/lib/jsr305-1.3.9.jar				      <
WEB-INF/lib/jstl-1.2.jar					WEB-INF/lib/jstl-1.2.jar
WEB-INF/lib/jtidy-r938.jar					WEB-INF/lib/jtidy-r938.jar
WEB-INF/lib/jug-2.0.0-asl.jar					WEB-INF/lib/jug-2.0.0-asl.jar
WEB-INF/lib/juniversalchardet-1.0.3.jar				WEB-INF/lib/juniversalchardet-1.0.3.jar
WEB-INF/lib/log4j-1.2.17.jar					WEB-INF/lib/log4j-1.2.17.jar
WEB-INF/lib/lucene-core-3.6.0.jar				WEB-INF/lib/lucene-core-3.6.0.jar
WEB-INF/lib/magnolia-about-app-5.2.6.jar		      |	WEB-INF/lib/magnolia-about-app-5.3.jar
WEB-INF/lib/magnolia-contacts-1.2.1.jar			      |	WEB-INF/lib/magnolia-contacts-1.3.jar
WEB-INF/lib/magnolia-core-5.2.5.jar			      |	WEB-INF/lib/magnolia-core-5.3.jar
WEB-INF/lib/magnolia-dam-1.2.5.jar			      |	WEB-INF/lib/magnolia-dam-api-2.0.jar
WEB-INF/lib/magnolia-demo-project-2.7.5.jar		      |	WEB-INF/lib/magnolia-dam-app-2.0.jar
WEB-INF/lib/magnolia-i18n-5.2.5.jar			      |	WEB-INF/lib/magnolia-dam-compatibility-2.0.jar
WEB-INF/lib/magnolia-jaas-5.2.5.jar			      |	WEB-INF/lib/magnolia-dam-core-2.0.jar
WEB-INF/lib/magnolia-messages-app-5.2.6.jar		      |	WEB-INF/lib/magnolia-dam-jcr-2.0.jar
WEB-INF/lib/magnolia-module-activation-5.2.4.jar	      |	WEB-INF/lib/magnolia-dam-templating-2.0.jar
WEB-INF/lib/magnolia-module-cache-5.2.3.jar		      |	WEB-INF/lib/magnolia-demo-project-2.8.jar
WEB-INF/lib/magnolia-module-categorization-2.2.2.jar	      |	WEB-INF/lib/magnolia-i18n-5.3.jar
							      >	WEB-INF/lib/magnolia-jaas-5.3.jar
							      >	WEB-INF/lib/magnolia-messages-app-5.3.jar
							      >	WEB-INF/lib/magnolia-module-activation-5.2.5.jar
							      >	WEB-INF/lib/magnolia-module-cache-5.2.4.jar
							      >	WEB-INF/lib/magnolia-module-categorization-2.3.jar
WEB-INF/lib/magnolia-module-commenting-2.2.1.jar		WEB-INF/lib/magnolia-module-commenting-2.2.1.jar
WEB-INF/lib/magnolia-module-data-2.2.2.jar		      |	WEB-INF/lib/magnolia-module-data-2.3.jar
WEB-INF/lib/magnolia-module-device-detection-1.0.3.jar		WEB-INF/lib/magnolia-module-device-detection-1.0.3.jar
WEB-INF/lib/magnolia-module-form-2.2.4.jar			WEB-INF/lib/magnolia-module-form-2.2.4.jar
WEB-INF/lib/magnolia-module-forum-3.3.1.jar		      |	WEB-INF/lib/magnolia-module-forum-3.4.jar
WEB-INF/lib/magnolia-module-google-sitemap-2.0.3.jar	      |	WEB-INF/lib/magnolia-module-google-sitemap-2.1.jar
WEB-INF/lib/magnolia-module-groovy-2.2.4.jar		      |	WEB-INF/lib/magnolia-module-groovy-2.3.jar
WEB-INF/lib/magnolia-module-imaging-3.0.4.jar		      |	WEB-INF/lib/magnolia-module-imaging-3.1.jar
WEB-INF/lib/magnolia-module-inplace-templating-2.2.3.jar      |	WEB-INF/lib/magnolia-module-inplace-templating-2.3.jar
WEB-INF/lib/magnolia-module-legacy-admininterface-5.2.3.jar	WEB-INF/lib/magnolia-module-legacy-admininterface-5.2.3.jar
WEB-INF/lib/magnolia-module-mail-5.1.3.jar			WEB-INF/lib/magnolia-module-mail-5.1.3.jar
WEB-INF/lib/magnolia-module-public-user-registration-2.2.3.ja |	WEB-INF/lib/magnolia-module-public-user-registration-2.3.jar
WEB-INF/lib/magnolia-module-resources-2.2.4.jar		      |	WEB-INF/lib/magnolia-module-resources-2.3.jar
WEB-INF/lib/magnolia-module-rssaggregator-2.2.3.jar	      |	WEB-INF/lib/magnolia-module-rssaggregator-2.3.jar
WEB-INF/lib/magnolia-module-scheduler-2.1.1.jar		      |	WEB-INF/lib/magnolia-module-scheduler-2.1.2.jar
WEB-INF/lib/magnolia-module-standard-templating-kit-2.7.5.jar |	WEB-INF/lib/magnolia-module-standard-templating-kit-2.8.jar
WEB-INF/lib/magnolia-pages-5.2.6.jar			      |	WEB-INF/lib/magnolia-pages-5.3.jar
WEB-INF/lib/magnolia-rendering-5.2.5.jar		      |	WEB-INF/lib/magnolia-rendering-5.3.jar
WEB-INF/lib/magnolia-rest-integration-1.0.3.jar			WEB-INF/lib/magnolia-rest-integration-1.0.3.jar
WEB-INF/lib/magnolia-rest-services-1.0.3.jar			WEB-INF/lib/magnolia-rest-services-1.0.3.jar
WEB-INF/lib/magnolia-sample-app-5.2.6.jar		      |	WEB-INF/lib/magnolia-sample-app-5.3.jar
WEB-INF/lib/magnolia-security-app-5.2.6.jar		      |	WEB-INF/lib/magnolia-security-app-5.3.jar
WEB-INF/lib/magnolia-templating-5.2.5.jar		      |	WEB-INF/lib/magnolia-task-management-1.0.jar
WEB-INF/lib/magnolia-templating-jsp-5.2.5.jar		      |	WEB-INF/lib/magnolia-templating-5.3.jar
WEB-INF/lib/magnolia-theme-pop-2.7.5.jar		      |	WEB-INF/lib/magnolia-templating-jsp-5.3.jar
WEB-INF/lib/magnolia-ui-actionbar-5.2.6.jar		      |	WEB-INF/lib/magnolia-theme-pop-2.8.jar
WEB-INF/lib/magnolia-ui-admincentral-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-actionbar-5.3.jar
WEB-INF/lib/magnolia-ui-api-5.2.6.jar			      |	WEB-INF/lib/magnolia-ui-admincentral-5.3.jar
WEB-INF/lib/magnolia-ui-contentapp-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-api-5.3.jar
WEB-INF/lib/magnolia-ui-dialog-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-contentapp-5.3.jar
WEB-INF/lib/magnolia-ui-form-5.2.6.jar			      |	WEB-INF/lib/magnolia-ui-dialog-5.3.jar
WEB-INF/lib/magnolia-ui-framework-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-form-5.3.jar
WEB-INF/lib/magnolia-ui-imageprovider-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-framework-5.3.jar
WEB-INF/lib/magnolia-ui-mediaeditor-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-imageprovider-5.3.jar
WEB-INF/lib/magnolia-ui-vaadin-common-widgets-5.2.6.jar	      |	WEB-INF/lib/magnolia-ui-mediaeditor-5.3.jar
WEB-INF/lib/magnolia-ui-vaadin-integration-5.2.6.jar	      |	WEB-INF/lib/magnolia-ui-vaadin-common-widgets-5.3.jar
WEB-INF/lib/magnolia-ui-vaadin-theme-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-vaadin-integration-5.3.jar
WEB-INF/lib/magnolia-ui-vaadin-widgetset-5.2.6.jar	      |	WEB-INF/lib/magnolia-ui-vaadin-theme-5.3.jar
WEB-INF/lib/magnolia-ui-widget-editor-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-widget-editor-5.3.jar
WEB-INF/lib/magnolia-ui-workbench-5.2.6.jar		      |	WEB-INF/lib/magnolia-ui-workbench-5.3.jar
							      >	WEB-INF/lib/magnolia-vaadin-widgetset-5.3.jar
WEB-INF/lib/mail-1.4.1.jar					WEB-INF/lib/mail-1.4.1.jar
WEB-INF/lib/mgwt-1.1.2.jar					WEB-INF/lib/mgwt-1.1.2.jar
WEB-INF/lib/mobiledetect-1.0.jar				WEB-INF/lib/mobiledetect-1.0.jar
WEB-INF/lib/mycila-guice-2.10.ga.jar				WEB-INF/lib/mycila-guice-2.10.ga.jar
WEB-INF/lib/objenesis-1.2.jar					WEB-INF/lib/objenesis-1.2.jar
WEB-INF/lib/openutils-log4j-2.0.5.jar				WEB-INF/lib/openutils-log4j-2.0.5.jar
WEB-INF/lib/oro-2.0.8.jar					WEB-INF/lib/oro-2.0.8.jar
WEB-INF/lib/pdfbox-1.8.1.jar					WEB-INF/lib/pdfbox-1.8.1.jar
WEB-INF/lib/poi-3.9.jar						WEB-INF/lib/poi-3.9.jar
WEB-INF/lib/poi-ooxml-3.9.jar					WEB-INF/lib/poi-ooxml-3.9.jar
WEB-INF/lib/poi-ooxml-schemas-3.9.jar				WEB-INF/lib/poi-ooxml-schemas-3.9.jar
WEB-INF/lib/poi-scratchpad-3.9.jar				WEB-INF/lib/poi-scratchpad-3.9.jar
WEB-INF/lib/proxytoys-1.01-MAGNOLIA-5317-patched.jar		WEB-INF/lib/proxytoys-1.01-MAGNOLIA-5317-patched.jar
WEB-INF/lib/quartz-1.8.6.jar					WEB-INF/lib/quartz-1.8.6.jar
WEB-INF/lib/reflections-0.9.9-RC1.jar				WEB-INF/lib/reflections-0.9.9-RC1.jar
WEB-INF/lib/resteasy-client-3.0.4.Final.jar			WEB-INF/lib/resteasy-client-3.0.4.Final.jar
WEB-INF/lib/resteasy-jackson-provider-3.0.4.Final.jar		WEB-INF/lib/resteasy-jackson-provider-3.0.4.Final.jar
WEB-INF/lib/resteasy-jaxb-provider-3.0.4.Final.jar		WEB-INF/lib/resteasy-jaxb-provider-3.0.4.Final.jar
WEB-INF/lib/resteasy-jaxrs-3.0.4.Final.jar			WEB-INF/lib/resteasy-jaxrs-3.0.4.Final.jar
WEB-INF/lib/rome-1.0.jar					WEB-INF/lib/rome-1.0.jar
WEB-INF/lib/rome-fetcher-1.0.jar				WEB-INF/lib/rome-fetcher-1.0.jar
WEB-INF/lib/sac-1.3.jar						WEB-INF/lib/sac-1.3.jar
WEB-INF/lib/scannotation-1.0.3.jar				WEB-INF/lib/scannotation-1.0.3.jar
WEB-INF/lib/slf4j-api-1.7.5.jar					WEB-INF/lib/slf4j-api-1.7.5.jar
WEB-INF/lib/slf4j-log4j12-1.7.5.jar				WEB-INF/lib/slf4j-log4j12-1.7.5.jar
WEB-INF/lib/stax-api-1.0-2.jar					WEB-INF/lib/stax-api-1.0-2.jar
WEB-INF/lib/swagger-annotations_2.10-1.3.0.jar			WEB-INF/lib/swagger-annotations_2.10-1.3.0.jar
WEB-INF/lib/tagsoup-1.2.1.jar					WEB-INF/lib/tagsoup-1.2.1.jar
WEB-INF/lib/tika-core-1.4.jar					WEB-INF/lib/tika-core-1.4.jar
WEB-INF/lib/tika-parsers-1.4.jar				WEB-INF/lib/tika-parsers-1.4.jar
WEB-INF/lib/vaadin-client-compiled-7.1.7.jar			WEB-INF/lib/vaadin-client-compiled-7.1.7.jar
WEB-INF/lib/vaadin-server-7.1.7.jar				WEB-INF/lib/vaadin-server-7.1.7.jar
WEB-INF/lib/vaadin-shared-7.1.7.jar				WEB-INF/lib/vaadin-shared-7.1.7.jar
WEB-INF/lib/vaadin-shared-deps-1.0.2.jar			WEB-INF/lib/vaadin-shared-deps-1.0.2.jar
WEB-INF/lib/vaadin-theme-compiler-7.1.7.jar			WEB-INF/lib/vaadin-theme-compiler-7.1.7.jar
WEB-INF/lib/vaadin-themes-7.1.7.jar				WEB-INF/lib/vaadin-themes-7.1.7.jar
WEB-INF/lib/velocity-1.5.jar					WEB-INF/lib/velocity-1.5.jar
WEB-INF/lib/vorbis-java-core-0.1-tests.jar			WEB-INF/lib/vorbis-java-core-0.1-tests.jar
WEB-INF/lib/vorbis-java-core-0.1.jar				WEB-INF/lib/vorbis-java-core-0.1.jar
WEB-INF/lib/vorbis-java-tika-0.1.jar				WEB-INF/lib/vorbis-java-tika-0.1.jar
WEB-INF/lib/xercesImpl-2.8.1.jar				WEB-INF/lib/xercesImpl-2.8.1.jar
WEB-INF/lib/xmlbeans-2.3.0.jar					WEB-INF/lib/xmlbeans-2.3.0.jar
Comment by Christopher Zimmermann [ 16/Jul/14 ]

Re-opened based on comment above.

Comment by Federico Grilli [ 16/Jul/14 ]

I tried to follow other paths to solve the issue but have got no luck so far and tbh I am at a loss now.

  • My tests run like this:
    • I use /demo-project/downloads/Magnolia_Flyer_4-0 in DAM workspace
    • I ensure the DAM index is re-created before running my query by deleting it physically from the file system and restarting the server
    • I use the JCR Queries app with the following SQL (1.0) statement select * from nt:base where contains(*, 'best choice')
    • In 5.2 this returns 0 nodes.
    • In 5.3 this returns one node /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content
  • I noticed that JR (2.6 and 2.8) pulls in tika-core 1.3 but no tika-parser. The latter dependency is declared explicitly in our empty-webapp pom as 1.4. So I tried to downgrade it to 1.3 so that both tika libraries have the same version but it didn't work.
  • Following a hint I found here http://t152909.apache-jackrabbit-user.apachetalk.us/problem-with-indexing-xml-docs-with-tika-in-jackrabbit-2-6-2-t152909.html I ensured that our DAM app actually sets the mimeType as e.g. application/pdf because it looks like that is what triggers the Tika parser. If you look at how the PDF doc I'm using for tests is stored in JCR you will see that mimeType is the same both in 5.2 and 5.3. The only difference being the fileName having the pdf extension in 5.3 and not in 5.2 but even forcing to save the document with the pdf extension in 5.2 doesn't solve the issue.
    2.8
    ---
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/size=131699
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:lastModifiedBy=admin
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:uuid=bdc8bb48-06ea-4778-bdce-59704b4b3b12
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/extension=pdf
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:data=<binary>
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:mimeType=application/pdf
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/nodeDataTemplate=
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:lastModified=2009-01-29T16:12:19.102+01:00
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:primaryType=mgnl:resource
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/fileName=Magnolia Flyer 4.0.pdf
    
    2.6
    ---
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/size=131699
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:lastModifiedBy=admin
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:uuid=bdc8bb48-06ea-4778-bdce-59704b4b3b12
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:data=<binary>
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/extension=pdf
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:mimeType=application/pdf
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:lastModified=2009-01-29T16:12:19.102+01:00
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/nodeDataTemplate=
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/jcr:primaryType=mgnl:resource
    /demo-project/downloads/Magnolia_Flyer_4-0/jcr:content/fileName=Magnolia Flyer 4.0
    
Comment by Jan Haderka [ 16/Jul/14 ]

BTW it also doesn't exist in JR 2.4.x, it's only on 2.6 branch

Comment by Federico Grilli [ 16/Jul/14 ]

After some debugging I was finally able to pinpoint the culprit. It's no less than commons-compress 1.0 which is pulled in by ui-framework 5.2.x. whereas in ui-framework 5.3 this has been updated to 1.8.1.
It goes like this.
While creating a Lucene index org.apache.tika.parser.AutoDetectParser.parse(..) tries to detect the Content-Type of a file stored in DAM by using several Detector(s), one of which is org.apache.tika.parser.pkg.ZipContainerDetector. The latter uses CompressorStreamFactory from commons-compress which fails silently at line #95. This error interrupts the indexing process without issuing any warning. Updating to commons-compress 1.8.1 solves the issue (that is also why search works in Magnolia 5.3). Tika-parsers doesn't pull in its commons-compressor dependency transitively because it is excluded in magnolia-project's pom (we copied there the same exclusions found in JR's pom). Now the question is: can we update a dependency to a major version in a maintenance release? If the answer is no as I guess, than the workaround is simply to replace the old commons-compress with 1.8.1 version in a custom project.

Comment by Federico Grilli [ 17/Jul/14 ]

Upgraded commons-compress to 1.8.1. It was decided to make an exception to the rule according to which we don't upgrade a dependency to a major version in a maintenance release.

Comment by Christopher Zimmermann [ 17/Jul/14 ]

Please add testcase.

Generated at Mon Feb 12 04:59:53 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.