[MAGNOLIA-7870] Lucene IndexMerger Error while merging indexes Created: 07/Sep/20  Updated: 30/Sep/21  Resolved: 02/Jul/21

Status: Closed
Project: Magnolia
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Neutral
Reporter: Viet Nguyen Assignee: Unassigned
Resolution: Workaround exists Votes: 5
Labels: maintenance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
causality
relation
is related to MGNLBACKUP-139 Unable to commit volatile index Open
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:
Epic Link: Support

 Description   

Steps to reproduce

  • We don't know how to reproduce this kind of issue but it happen widely to different customers recently. Please reference to linked issues for more information.

Expected results

Bug being fixed, improvement being implemented, preventive action being found, etc.

Actual results

Customer periodically facing issue with Lucene IndexMerger Error while merging indexes no segments* file found due to NativeFSLockFactory write.lock...
This pattern of log flooded the disk:

2020-09-07 09:43:53,087 ERROR rg.apache.jackrabbit.core.query.lucene.IndexMerger: Error while merging indexes:
org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.jackrabbit.core.query.lucene.directory.FSDirectoryManager$FSDir@org.apache.lucene.store.SimpleFSDirectory@/magnolia/repositories2/magnoliaAuthor/magnolia/workspaces/mgnlVersion/index/_4d lockFactory=org.apache.lucene.store.NativeFSLockFactory@4d68bae9: files: [write.lock]
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:667) ~[lucene-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:31:16]
	at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:72) ~[lucene-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:31:16]
	at org.apache.lucene.index.IndexReader.open(IndexReader.java:454) ~[lucene-core-3.6.0.jar:3.6.0 1310449 - rmuir - 2012-04-06 11:31:16]
	at org.apache.jackrabbit.core.query.lucene.AbstractIndex.getReadOnlyIndexReader(AbstractIndex.java:312) ~[jackrabbit-core-2.18.1.jar:2.18.1]
	at org.apache.jackrabbit.core.query.lucene.AbstractIndex.getReadOnlyIndexReader(AbstractIndex.java:334) ~[jackrabbit-core-2.18.1.jar:2.18.1]
	at org.apache.jackrabbit.core.query.lucene.PersistentIndex.getReadOnlyIndexReader(PersistentIndex.java:168) ~[jackrabbit-core-2.18.1.jar:2.18.1]
	at org.apache.jackrabbit.core.query.lucene.MultiIndex.getIndexReaders(MultiIndex.java:552) ~[jackrabbit-core-2.18.1.jar:2.18.1]
	at org.apache.jackrabbit.core.query.lucene.IndexMerger$Worker.run(IndexMerger.java:522) [jackrabbit-core-2.18.1.jar:2.18.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_211]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_211]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_211]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_211]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]

That flood the log causing full disk issue.

This kind of issue happen more frequently now and there are more and more customers facing similar issue that need us to do some deeper investigation and having preventive actions.

Workaround

  • Stop the instance (it should have been dead in fact).
  • Install the patch
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-core</artifactId>
      <version>3.6.0-LUCENE-4738</version>
    </dependency>
    
  • Remove the "big log" and all repository index folders
  • Start up again the instance for indexes being recreate.
  • Would be great if customer also follow our Consistency checks and fixes guideline.

Development notes

Notes to/from developers
Please consider my comment below setting some flags to true (enableConsistencyCheck , forceConsistencyCheck, autoRepair) by default when bundling the releases.



 Comments   
Comment by David Caviedes Marquez [ 01/Oct/20 ]

Hello,

 

we are suffering this problem since a long time ago in multiple and different Magnolia projects, and it happens alleatory in any workspace. We had to schedule a periodic index deleting task as a workaround, but we need Magnolia to solve this problem because this workaround implies service loosing (Magnolia has to be stopped) during index deleting process.

I hope you can solve it!

 

Thanks in advance

Comment by Viet Nguyen [ 02/Oct/20 ]

Most of customers running into this issue didn't follow our recommendation in Repository inconsistency - Search index setting below flags to true for their "workspace.xml" files:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    <param name="path" value="${wsp.home}/index"/>
    <param name="enableConsistencyCheck" value="true"/>
    <param name="forceConsistencyCheck" value="true"/>
    <param name="autoRepair" value="true"/>
</SearchIndex>

Initially if we set it to true in "jackrabbit-bundle-xxxx-search.xml" then customers might not have to do this manually later incase of repository inconsistency. This might slowdown startup process but it increase system stability. So we will consider update our default bundle to set all these flag to true out of the box.

PS: please note that both flags enableConsistencyCheck and forceConsistencyCheck need to set to true so that system will do it on startup, only one of them was enabled might not result in our expected behaviour.

Comment by Viet Nguyen [ 14/Oct/20 ]

One more thing to consider is that how customers are starting Magnolia might relates to this issue, the issue of Too many open files could be the cause. Please try to fix the issue from OS rather than starting Magnolia using "--ignore-open-files-limit" option.

Comment by Mercedes Iruela [ 04/Mar/21 ]

Closed as related to MGNLBACKUP-139, support tickets have been linked to the new one.

Generated at Mon Feb 12 04:27:34 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.