[MAGNOLIA-7807] Imaging workspace indexing is slowing down recovery and replication of instances Created: 20/May/20  Updated: 15/Mar/21  Resolved: 15/Mar/21

Status: Closed
Project: Magnolia
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Neutral
Reporter: Jan Haderka Assignee: Unassigned
Resolution: Workaround exists Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Relates
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)

 Description   

By default imaging workspace is indexed for search. Despite the fact that workspace content is never retrieved over search queries. Workspace contains many of the binaries which are inherently slow to index, or even just check for indexing (look ahead required to peek the real content type for detectors/parsers to decide whether or not to index requires opening of the stream and accessing content in DB which is relatively expensive op).
This can lead to reindexing times being more than 60 minutes for moderately sized workspaces.
Possible solution would be to simply create this workspace w/o search index, or for existing installation to remove <SearchIndex/> configuration from the imaging workspace workspace.xml.



 Comments   
Comment by Jan Haderka [ 15/Mar/21 ]

Issue is rather deep in JackRabbit and without significant changes to it, rewriting of the imaging to not rely on JCR is the only proper solution.

As a workaround, one can (and should) manually switch off indexing for imaging workspace.

Generated at Mon Feb 12 04:27:00 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.