[MAGNOLIA-7762] Redirects (Virtual URI mapping) not working Created: 02/Apr/20  Updated: 22/Sep/21  Resolved: 13/May/20

Status: Closed
Project: Magnolia
Component/s: Virtual URI mappings
Affects Version/s: None
Fix Version/s: 5.7.7, 6.1.6, 6.2.1

Type: Bug Priority: Neutral
Reporter: David Lopez Assignee: Evzen Fochr
Resolution: Fixed Votes: 0
Labels: cs-bk, maintenance
Remaining Estimate: 0d
Time Spent: 2d 1h
Original Estimate: Not Specified

Issue Links:
Cloners
is cloned by MAGNOLIA-7798 DirectoryWatcher overflow leads to sk... Open
Relates
relates to DOCU-1993 Filesystem property that can block di... Closed
causality
relation
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Release notes required:
Yes
Documentation update required:
Yes
Date of First Response:
Epic Link: Support
Sprint: Maintenance 3, Maintenance 4, Maintenance 5, Maintenance 6
Story Points: 5

 Description   

It seems that when having big amount of virtual URI mappings (500+) in light modules, loading and processing of mappings get randomly interrupted with misleading error

WARN  info.magnolia.config.source.yaml.YamlConfigurationSource 19.03.2020 15:04:09 -- 
Severity type:   MAJOR 
Title:           /: [Definition resolution] - Source data processing problem 
Problem details: Failed to instantiate an object of type [interface info.magnolia.virtualuri.VirtualUriMapping] due to [No suitable constructor found for class [interface info.magnolia.virtualuri.VirtualUriMapping]], null is returned

The error is posted in the logs despite the fact that same yaml file loads on different instance w/o any issues and class used for the mapping is default mapping included in Magnolia installation OOTB.
Error also appears at random when reloading of definitions is forced by adding new one.
And at random might get fixed (or not) by restart of the instance.
There are no errors reported on any other of the existing mappings thus making it impossible to figure out if any other file might be the cause. There is also no known debug setting that would help ti triage the issue nor means to verify all existing mappings to figure out if there is conflict elsewhere.

Details of existing investigation in linked tickets:



 Comments   
Comment by Evzen Fochr [ 21/Apr/20 ]

For documentation:
We are now logging warning for overflow event in DirectoryWatcher. This happens when system maximum user watches is exceeded and cause lost of some events. This lost events (aka updating yaml file) is not processed.
On windows system, buffer size in WindowsWatchService implementation is hardcoded and cannot be configured.
For linux/unix based systems see https://unix.stackexchange.com/questions/13751/kernel-inotify-watch-limit-reached

Comment by Evzen Fochr [ 23/Apr/20 ]

Ok done some more researching:
Used this bruteforce test for 1k files in

DirectoryWatcherTest
@Test
    public void properName() throws Exception {
        // GIVEN
        int numberOfTestFiles = 1000;
        File parentDir = watchedDir.newFolder("parentDir");
        verify(callback, timeout(FILE_OPERATION_TIMEOUT)).added(parentDir.toPath());

        File directory1 = new File(parentDir, "directory1");
        directory1.mkdir();
        verify(callback, timeout(FILE_OPERATION_TIMEOUT)).added(directory1.toPath());

        reportExecutionTime();

        // WHEN
        File[] files = new File[numberOfTestFiles];
        for (int i = 0; i < numberOfTestFiles; i++) {
            File file = new File(directory1, "file" + i);
            file.createNewFile();
            files[i] = file;
        }

        // THEN
        for (int i = 0; i < numberOfTestFiles; i++) {
            verify(callback, timeout(FILE_OPERATION_TIMEOUT)).added(files[i].toPath());
        }
    }

Findings:
there are 3 relevant system properties
fs.inotify.max_user_instances
fs.inotify.max_user_watches
fs.inotify.max_queued_events

If one of them is too low, I got IOException with description like "limit reached"
but in case
fs.inotify.max_user_instances
fs.inotify.max_user_watches
are fine by themselves but their combination (function unknown) is too low WatchEvent OVERFLOWN occurs without any IOException.

Default values of my system was:
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 524288
fs.inotify.max_queued_events = 16384

To handle events from 1k of new files fs.inotify.max_user_instances had to be raised to 1000.

Comment by Evzen Fochr [ 24/Apr/20 ]

3rd threshold, this time in LinuxWatchKey :doublefacepalm: it occurs when our event handling is too slow and 512 events are stack under one directory key. And this time it seems I need to do some coding http://mail.openjdk.java.net/pipermail/nio-dev/2009-December/000811.html

Comment by Mikaël Geljić [ 13/May/20 ]

Closing with partial solution adding a log statement when DirectoryWatcher overflow occurs.
Proposed to mitigate issue by reducing amount of files in a single directory (use more nesting).

Following up with additional work to reconcile the state of observed directories and registered files automatically, when it becomes inconsistant (say because of overflow). => MAGNOLIA-7798.

Generated at Mon Feb 12 04:26:36 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.