[MGNLBACKUP-140] Avoid indexing while backing up Created: 09/Apr/21  Updated: 20/Jul/22  Resolved: 25/Aug/21

Status: Closed
Project: Backup
Component/s: None
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Improvement Priority: Neutral
Reporter: Richard Gange Assignee: Unassigned
Resolution: Workaround exists Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File backup-index-report-v2.4.1.txt     Text File backup-index-report-v2.4.2-SNAPSHOT.txt    
Issue Links:
relation
is related to MGNLBACKUP-139 Unable to commit volatile index Open
is related to MGNLBACKUP-129 Backup on running instance can corrup... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:
Epic Link: Nucleus Quality Maintenance
Team: Nucleus

 Description   

The current version of the backup module uses the underlying RepositoryCopier provided by JackRabbit. By using this we lose control over being able to specify the configuration of the target repository. By default JR will use the one found in the JR Core module in the package org.apache.jackrabbit.core. This default repository configuration file specifies a SearchIndex configuration. This triggers indexing while creating the target repository for backup. After the target repository is created JR then deletes the index from each workspace.

Excerpt from RepositoryCopier#copy()

    /**
     * Copies the contents of the given source repository to a target
     * repository with the given configuration.
     * <p>
     * The source repository <strong>must not be modified</strong> while
     * the copy operation is running to avoid an inconsistent copy.
     *
     * @param source source repository directory
     * @param target target repository directory
     * @throws RepositoryException if the copy operation fails
     */
    public static void copy(RepositoryImpl source, RepositoryConfig target)
            throws RepositoryException {
        RepositoryImpl repository = RepositoryImpl.create(target);
        try {
            new RepositoryCopier(source, repository).copy();
        } finally {
            repository.shutdown();
        }

        // Remove index directories to force re-indexing on next startup
        // TODO: There should be a cleaner way to do this
        File targetDir = new File(target.getHomeDir());
        File repoDir = new File(targetDir, "repository");
        FileUtils.deleteQuietly(new File(repoDir, "index"));
        File[] workspaces = new File(targetDir, "workspaces").listFiles();
        if (workspaces != null) {
            for (File workspace : workspaces) {
                FileUtils.deleteQuietly(new File(workspace, "index"));
            }
        }
    }

The TODO in the method indicates this is not an ideal scenario to have to traverse the workspaces deleting each index folder.

Notes:

  • The repository manager should be used for the source repo so that there is no additional indexing overhead created while copying the data. See backup-index-report-v2.4.1.txt
  • The backup module should provide a repository configuration which excludes a SearchIndex configuration. This way the index is never created on the target repo and doesn't need to be deleted as part of a cleanup.

Workaround
Use https://wiki.magnolia-cms.com/display/SERVICES/Backup+Extended which injects the repository directly into the copier.


Generated at Sun Feb 11 23:26:01 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.