[MAGNOLIA-2152] QueryManager and JCR browser seem to have different view on repository Created: 26/May/08 Updated: 23/Jan/13 Resolved: 05/Aug/08 |
|
| Status: | Closed |
| Project: | Magnolia |
| Component/s: | core |
| Affects Version/s: | 3.5.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Philippe Marschall | Assignee: | Jan Haderka |
| Resolution: | Workaround exists | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Template: |
|
| Acceptance criteria: |
Empty
|
| Task DoD: |
[ ]*
Doc/release notes changes? Comment present?
[ ]*
Downstream builds green?
[ ]*
Solution information and context easily available?
[ ]*
Tests
[ ]*
FixVersion filled and not yet released
[ ] 
Architecture Decision Record (ADR)
|
| Bug DoR: |
[ ]*
Steps to reproduce, expected, and actual results filled
[ ]*
Affected version filled
|
| Date of First Response: |
| Description |
|
We have a very strange issue with querying custom page attributes. We have a custom page dialog that sets some custom page properties. These attributes are ad hoc and not defined in some sort of schema. We then have observation code that queries the JCR for pages with these attributes and certain values. Our queries look like this: //element(*, mgnl:content) [@someLongAttribute and @someLongAttribute != 0]
SELECT * FROM mgnl:content WHERE someLongAttribute IS NOT NULL AND someLongAttribute != 0 //element(*, mgnl:content) [@someBooleanAttribute and @someBooleanAttribute = 'true']
SELECT *
FROM someBooleanAttribute IS NOT NULL AND someBooleanAttribute = 'true'
The behavior we are observing is that for certain pages the check for an attribute value is always false even if it should match according to the JCR browser. However the check for the attribute presence works as expected. As soon as we change a page attribute in the JCR browser the checks for the values work even if changed with the page properties dialog. This issue is very rare and we have not yet found a way to reproduce or trigger it. |
| Comments |
| Comment by Philippe Marschall [ 15/Jul/08 ] |
|
More descriptive title |
| Comment by Philippe Marschall [ 15/Jul/08 ] |
|
Some more information: The behaviour we are observing is that the QueryManager sometimes seems to have a different view on the JCR than the JCR browser until "something" happens and they are in synch again. To bring them in synch again we have worked out the following procedure:
Simply exporting, deleting and then importing did not fix the issue. We don't observe this behavior in the development environment with Derby but on test / production with Oracle. We observed this behavior both in the website and data workspace. |
| Comment by Magnolia International [ 15/Jul/08 ] |
|
It is likely to be related to indexing. Indexing happens (by default) asynchronously - and queries are based on those indexes. |
| Comment by Philippe Marschall [ 16/Jul/08 ] |
|
We use Oracle 9 and the bundle persistence manager. jackrabbit-oracle-search.xml<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 1.2//EN" "http://jackrabbit.apache.org/dtd/repository-1.2.dtd"> <Repository> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/repository" /> </FileSystem> <Security appName="Jackrabbit"> <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager"></AccessManager> <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule"> <param name="anonymousId" value="anonymous" /> </LoginModule> </Security> <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" /> <Workspace name="default"> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${wsp.home}/default" /> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.Oracle9PersistenceManager"> <param name="driver" value="oracle.jdbc.OracleDriver"/> <param name="url" value="jdbc:oracle:thin:@localhost:521:localhost"/> <param name="user" value="sysdba"/> <param name="password" value="secret"/> <param name="schemaObjectPrefix" value="${wsp.name}_"/> <param name="externalBLOBs" value="false"/> </PersistenceManager> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> <param name="path" value="${wsp.home}/index" /> <param name="useCompoundFile" value="true" /> <param name="minMergeDocs" value="100" /> <param name="volatileIdleTime" value="3" /> <param name="maxMergeDocs" value="100000" /> <param name="mergeFactor" value="10" /> <param name="maxFieldLength" value="10000" /> <param name="bufferSize" value="10" /> <param name="cacheSize" value="1000" /> <param name="forceConsistencyCheck" value="false" /> <param name="autoRepair" value="true" /> <param name="analyzer" value="org.apache.lucene.analysis.standard.StandardAnalyzer" /> <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" /> <param name="respectDocumentOrder" value="true" /> <param name="resultFetchSize" value="2147483647" /> <param name="extractorPoolSize" value="3" /> <param name="extractorTimeout" value="100" /> <param name="extractorBackLogSize" value="100" /> <param name="textFilterClasses" value="org.apache.jackrabbit.extractor.MsWordTextExtractor, org.apache.jackrabbit.extractor.MsExcelTextExtractor, org.apache.jackrabbit.extractor.MsPowerPointTextExtractor, org.apache.jackrabbit.extractor.PdfTextExtractor, org.apache.jackrabbit.extractor.OpenOfficeTextExtractor, org.apache.jackrabbit.extractor.RTFTextExtractor, org.apache.jackrabbit.extractor.HTMLTextExtractor, org.apache.jackrabbit.extractor.PlainTextExtractor, org.apache.jackrabbit.extractor.XMLTextExtractor" /> </SearchIndex> </Workspace> <Versioning rootPath="${rep.home}/version"> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/workspaces/version" /> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.Oracle9PersistenceManager"> <param name="driver" value="oracle.jdbc.OracleDriver"/> <param name="url" value="jdbc:oracle:thin:@localhost:1521:localhost"/> <param name="user" value="sysdba"/> <param name="password" value="secret"/> <param name="schemaObjectPrefix" value="version_"/> <param name="externalBLOBs" value="false"/> </PersistenceManager> </Versioning> </Repository> respository.xml<!DOCTYPE JCR [
<!ELEMENT Map (#PCDATA)>
<!ATTLIST Map
name CDATA #REQUIRED
repositoryName CDATA #REQUIRED
workspaceName CDATA #REQUIRED>
<!ELEMENT JCR (RepositoryMapping|Repository)*>
<!ELEMENT param (#PCDATA)>
<!ATTLIST param
name CDATA #REQUIRED
value CDATA #REQUIRED>
<!ELEMENT Repository (param|workspace)*>
<!ATTLIST Repository
loadOnStartup CDATA #REQUIRED
name CDATA #REQUIRED
provider CDATA #REQUIRED>
<!ELEMENT workspace (#PCDATA)>
<!ATTLIST workspace
name CDATA #REQUIRED>
<!ELEMENT RepositoryMapping (Map)*>
]><JCR>
<RepositoryMapping>
<Map name="website" repositoryName="magnolia" workspaceName="website" />
<Map name="config" repositoryName="magnolia" workspaceName="config" />
<Map name="users" repositoryName="magnolia" workspaceName="users" />
<Map name="userroles" repositoryName="magnolia" workspaceName="userroles" />
<Map name="usergroups" repositoryName="magnolia" workspaceName="usergroups" />
<Map name="mgnlSystem" repositoryName="magnolia" workspaceName="mgnlSystem" />
<Map name="mgnlVersion" repositoryName="magnolia" workspaceName="mgnlVersion" />
<Map name="dms" repositoryName="magnolia" workspaceName="dms" />
<Map name="packages" repositoryName="magnolia" workspaceName="packages" />
<Map name="Store" repositoryName="magnolia" workspaceName="Store" />
<Map name="Expressions" repositoryName="magnolia" workspaceName="Expressions" />
</RepositoryMapping>
<Repository name="magnolia" provider="info.magnolia.jackrabbit.ProviderImpl" loadOnStartup="true">
<param name="configFile" value="${magnolia.repositories.jackrabbit.config}" />
<param name="repositoryHome" value="${magnolia.repositories.home}/magnolia" />
<param name="contextFactoryClass" value="org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory" />
<param name="providerURL" value="localhost" />
<param name="bindName" value="${magnolia.webapp}" />
<workspace name="website" />
<workspace name="config" />
<workspace name="users" />
<workspace name="userroles" />
<workspace name="usergroups" />
<workspace name="mgnlSystem" />
<workspace name="mgnlVersion" />
<workspace name="dms" />
<workspace name="packages" />
<workspace name="Store" />
<workspace name="Expressions" />
</Repository>
</JCR>
|
| Comment by Jan Haderka [ 16/Jul/08 ] |
Seems to me like something kills the indexer. In your configuration <param name="extractorPoolSize" value="3" /> <param name="extractorTimeout" value="100" /> <param name="extractorBackLogSize" value="100" /> you are using 3 indexers running on the background asynchronously. If the issue was simple failing to index the data, you should see the exception in the logs. If the indexing slows down it will be cancelled after the timeout (value is in miliseconds).
The question is why should indexing slow down permanently (so once it happens document will always timeout before it can be indexed) and why should it get up to speed after restart ... you might be hitting some underlying issue with either JackRabbit or Lucene. |
| Comment by Jan Haderka [ 16/Jul/08 ] |
|
Actually if the timeout is the issue setting logging level to INFO for org.apache.jackrabbit.core.query.lucene.TextExtractorJob should make dump messages about timeouts to the log file. |
| Comment by Philippe Marschall [ 16/Jul/08 ] |
|
Or log level for rg.apache.jackrabbit.core.query.lucene.TextExtractorJob was WARN |
| Comment by Philippe Marschall [ 04/Aug/08 ] |
|
We had a look at our logs and did find the described log entry only when we bootstrapped with samples. It looks at first glance that the timeout happened for a PDF file. However we still deployed the configuration change (disable async indexing) first to our test and now our production environment. All the inconsistencies we could previously observe are now gone. I don't know if this has any influence or not but we run inside an OpenVZ virtual machine. As far as we are concerned the bug can be closed. If we observe some anomalies again we'll file a new one. Thanks again for all the help. |