[MGNLBACKUP-139] Unable to commit volatile index Created: 03/Mar/21  Updated: 11/Sep/23

Status: Open
Project: Backup
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Richard Gange Assignee: Dominik Maslanka
Resolution: Unresolved Votes: 1
Labels: VN-Analysis, maintenance
Remaining Estimate: 0d
Time Spent: 3d 5h
Original Estimate: Not Specified

Attachments: PNG File backup-diagram.png    
Issue Links:
Relates
causality
relation
is related to MAGNOLIA-7870 Lucene IndexMerger Error while mergin... Closed
is related to MGNLBACKUP-129 Backup on running instance can corrup... Closed
is related to MAGNOLIA-8019 Add a new scope for logged in users t... Open
is related to MGNLUI-6776 DOC: Better define scope of Backup mo... Closed
is related to MGNLBACKUP-140 Avoid indexing while backing up Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[X]* Tests
[X]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[X]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:
Epic Link: Maintenance of Backup module
Team: Nucleus

 Description   

Problem
Several customers and partners have been reporting indexing issues occurring after running the backup module. We need to clarify better the best practices and/or make some changes to the module which would help prevent these issues from occurring.

Documentation
Currently we have a note in the documentation which reads:

Although it is possible to backup from a running Magnolia instance, it is not a recommended best practice. To ensure stability and prevent potential inconsistencies caused by nodes being published (versioned) during the backup process, avoid this option if possible.

However, this is not enough. We need some mechanism within the module itself to prevent this issue from occurring.

Error

ERROR org.apache.jackrabbit.core.query.lucene.MultiIndex 22.02.2021 13:23:39 – Unable to commit volatile indexERROR org.apache.jackrabbit.core.query.lucene.MultiIndex 22.02.2021 13:23:39 – Unable to commit volatile indexorg.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:233)

In the log files that have been supplied to support we can see these issues originating for the tasks, workflow, mgnl-system, mgnl-version workspaces.

Notes

  • Can we freeze publication during backup?
  • If someone schedules a publication during a backup window can we warn against it?
  • What are best practices for running the backup module on a public instance? Does it make sense? What are the alternatives?
  • What are the recommendations from running backup on author and public simultaneously? Can we actually be sure they are in sync? Perhaps the module should be disabled on the public instance.
  • When a backup begins should we warn the logged in users? "Backup will begin in x mins. Save your work and logout."

Patched Lucene

<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-core</artifactId>
  <version>3.6.0-LUCENE-4738</version>
</dependency>


 Comments   
Comment by Richard Gange [ 29/Jul/21 ]

Today I released it https://wiki.magnolia-cms.com/display/SERVICES/Backup+Extended

Comment by Quach Hao Thien [ 14/Mar/22 ]

Initial Discovery

Solution of this PR is configuring a repository.xml, specifying for backup executing to the backup repository home (backup target dir), this specified repository.xml will suppress the SearchIndex configuration to prevent the problem when the indexing is not ready by some situation like publishing content activity as description.

Solution works, need more test to simulate the case of backup while indexing

 

Comment by Rabie Hayoun [ 14/Mar/22 ]

Please check what work is left to be done for the above PR.

Comment by Quach Hao Thien [ 15/Mar/22 ]

rgange, by adopt a module from incubator we may need agreement from PM or PO (cc dduehning), in this ticket we discover the problem and try to find the solution in the scope of bug fixing

Comment by Quach Hao Thien [ 18/Mar/22 ]

Update discovery:

Things need to be done:

  • Revise Backup executor to turn off the indexing for the copy (ref to the PR)
    • config repository.xml properly to turn off the indexing (SearchIndex), other configurations should be the same as running instance's configuration
  • Revise the Restore executor to turn on the indexing while doing restoration 
    • restore the repo with the same configuration as the instance, and the indexing (SearchIndex) should be turn on properly
  • Make sure both ce and dx-core can be started up successfully after restored

Thank rgange , your input is really valuable 

Comment by Roman Kovařík [ 16/Aug/22 ]

What's the affected version? Is M5.7.x affected as well?

Comment by Richard Gange [ 16/Aug/22 ]

Yes, there are some support tickets reporting the issue in 5.7. It all started when we switched the backup module to use RepositoryCopier.

Comment by David Lopez [ 24/Aug/22 ]

Backlog check session 24/08/2022

  • Notification on notifications area (for editors) that a back up is about to start
  • Provide a notification for Admins that back up would affect editors (same message as documentation

ToDo

dmaslanka / dlopez  to check with avongunten what could be the solution

Comment by Dominik Maslanka [ 31/Jan/23 ]

 We need to redefine the best practices and make changes to the module, which would help prevent the above issues, as the customers and partners have been reporting indexing issues after running the backup module. This will also require cooperation with other departments to better define these better practices.

Things that need to be done:

  • Revise Backup executor to turn off the indexing for the copy (ref to the PR)
    • config repository.xml properly to turn off the indexing (SearchIndex), other configurations should be the same as running instance's configuration
  • Revise the Restore executor to turn on the indexing while doing restoration 
    • restore the repo with the same configuration as the instance, and the indexing (SearchIndex) should be turned on properly
  • Make sure both ce and dx-core can be started up successfully after restored
  • Define with abrooks , fcherchi and rkovarik better good practices. The topic will be addressed in February. Thanks  rgange  for support on this topic.
Generated at Sun Feb 11 23:26:00 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.