[MGNLRES-319] Improve ResourcesTemplatingFunctions#generate performance Created: 24/Dec/18  Updated: 29/Mar/22  Resolved: 03/Feb/20

Status: Closed
Project: Magnolia Resources Module
Component/s: management, resourceLoaders
Affects Version/s: 2.6.3
Fix Version/s: 2.6.4, 2.7.1, 3.0

Type: Improvement Priority: Neutral
Reporter: Viet Nguyen Assignee: Aleksandr Pchelintcev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
causality
relation
is related to MGNLRES-344 Also improve regex performance in Res... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Release notes required:
Yes
Date of First Response:
Epic Link: Support
Sprint: UI Framework & 6.2 Ramp up 15
Story Points: 2
Team: Nucleus

 Description   

As described in SUPPORT-9441,

info.magnolia.modules.resources.templating.ResourcesTemplatingFunctions#generate method used by resfn, scans all the files and directories of the resources directory + all the files found on the classpath to match their path with the provided js or css path that is always considered as a potential regexp.

The recursive scanning of the resource directory is very expensive when you have a lot of light modules (especially in Java 8 due to https://bugs.openjdk.java.net/browse/JDK-8153414).
And matching, by default, every file path with the provided path is very expensive in terms of CPU usage.

In our context, removing the calls to resfn multiplied the throughput by 3 on non cached pages.

There should be an extra function to allow us to provide the complete path of the resources so that the scanning + pattern matching is not forced.

Providing a cache for already scanned resources would significantly reduce system load - which is an expected result.



 Comments   
Comment by Joerg von Frantzius [ 17/Jan/20 ]

Pull-request https://git.magnolia-cms.com/projects/MODULES/repos/resources/pull-requests/85/overview provides support for glob-style patterns and optimizes for them.

Following https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html#getPathMatcher(java.lang.String) , when a pattern is prefixed with "glob:", it is interpreted as a glob-style pattern, e.g. "glob:/foobar-module/webresources/css/*.css". The optimization is that matching will start in the directory before the first "*" (star) encountered, so in the example, matching will start in "/foobar-module/webresources/css" and only compare with the files found in there, usually just a handful. In the reported case, a full path to a file can be given as glob pattern as well.

A usual Magnolia projects contains > 20,000 files, so comparisons are reduced by multiple orders of magnitude. This is happening in every call to resfn, so in a usual webpage with ~5 calls to resfn, status quo amounts to 100,000 regular expression matches only in a single page request!

Additionally, when matching regular expressions, compiled regular expressions are now cached, thereby roughly doubling performance without any further interventions.

Comment by Joerg von Frantzius [ 04/Feb/20 ]

Hi @Aleksandr @apchelintcev

in the meantime I realized that we could apply this optimization to regex expressions as well, so existing projects would profit from it without having to switch to glob expressions. I was thinking of this:

    /**
     * Find out where in a String things start looking like a regex. Do this by matching
     * anything that doesn't seem to be a "normal" file name character:
     * alphanumeric, horizontal whitespace or underscore characters.
     * Will result in funky
     * but perfectly legal Unix paths being regarded as regex, e.g.
     * if they contain characters like squiggly brackets or square brackets.
     * When used to determine a path prefix, this will be shorter than possible.
     * This only means our heuristic optimization will not apply to such
     * funky path names, while in 99,9% of cases it will.
     */
    private static final Pattern LIKELY_REGEX_EXPR = Pattern.compile("([^/\\-\\p{Alnum}\\h_])");
 

Then we could detect a path prefix to start matching from like with glob patterns now? 
(Regex isn't correct, but tries to show the idea)

Comment by Joerg von Frantzius [ 04/Feb/20 ]

Please see https://git.magnolia-cms.com/projects/MODULES/repos/resources/pull-requests/91/overview

Generated at Mon Feb 12 06:49:32 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.