[MGNLETK-62] multisite detection: implement a rule based algorithm to make the outcome more predictable. Created: 23/Aug/11  Updated: 05/Mar/12  Resolved: 23/Aug/11

Status: Closed
Project: Extended Templating Kit (closed)
Component/s: None
Affects Version/s: None
Fix Version/s: 1.4.3

Type: Improvement Priority: Neutral
Reporter: Philipp Bärfuss Assignee: Philipp Bärfuss
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File ETKModule.diff     File ETKSiteManager.diff     PNG File local-stacktrace.png     PNG File profile-data.png     PNG File tree-openend.png    
Issue Links:
causality
is causing MGNLETK-72 Admincentral website trees take a lon... Closed
supersession
supersedes MGNLETK-57 ETKSiteManager in use with domain map... Closed
supersedes MGNLETK-59 site mapping: don't ignore the uri pr... Closed
is superseded by MGNLETK-51 AdminCentral generates incorrect link... Closed
Template:
Acceptance criteria:
Empty
Date of First Response:

 Description   

While trying to fix some multi site related issues, I decided to rewrite the code to make it more readable and predictable.

  • we have a set of rules which use matchers to match the sites
  • the processing can be analyzed by setting the logging to debug level
    • plain text descriptions, ...
  • the rule set can be overwritten


 Comments   
Comment by Stefan Baur [ 21/Sep/11 ]

Hi Philipp

We just tried to find the place, where those matchers /rule stes are located but didnt find them. Can you give us a hint, how we can make use of this improvement?

Comment by Rico Jansen [ 01/Mar/12 ]

After we upgraded to ETK 1.4.4 we have performance issues, mainly in Admincentral website tree.
We traced it back to the ETKSiteManager and the findBestMatchingSites method. Our profiling shows it
spends a lot of time there when opening trees. Interestingly enough the JCR tree is still responsive.

We do have a lot of site definitions (40+), so it has to iterate a lot of data.

Comment by Jan Haderka [ 01/Mar/12 ]

Hi Rico,

thx for the info, is the method invocation the issue or is single invocation ok, but it gets called multiple times? Would it be possible to attach profiling data? Or at least average execution time for the method, number of executions during tree rendering and stacktrace showing where the invocations come from?

Thx,
Jan

Comment by Rico Jansen [ 01/Mar/12 ]

It looks like it gets called quite a lot, at least in the debugger it looks like it.
We had a hell of time finding the problem as it doesn't really manifest in our development environment.
Only with production data we see the issue clearly.
I have attached an example of a profile on our testserver. Opening the wetenschap tree with 10 items.
I also attached a stacktrace of a local development instance openening a similar tree.
I can try to provide more information but I haven't been able to get stacktraces out of our test environment showing
the problem.
Note that we have a lot of sites:
mgnl> info.magnolia.module.extendedtemplatingkit.ETKModule.getInstance().getSites().size();
===>
91

Comment by Rico Jansen [ 01/Mar/12 ]

It seems it is caused some where else because getAssignedSite is called insanely often:
WARN info.magnolia.module.extendedtemplatingkit.sites.ETKSiteManager 01.03.2012 17:24:54 – getAssignedSite (52853 times) (returning default) took: 0 ms
This is clean startup up with me logging in, it is already called over 50.000 times.
Opening the wetenschap tree with 10 entries:
WARN info.magnolia.module.extendedtemplatingkit.sites.ETKSiteManager 01.03.2012 17:26:43 – getAssignedSite (99239 times) (returning default) took: 0 ms
So close to 50.000 calls for opening a tree with 10 entries. I have the feeling something is very wrong.

Calls are mainly :
WARN info.magnolia.module.extendedtemplatingkit.sites.ETKSiteManager 01.03.2012 17:26:19 – Try to match domain [redactie-live.test.vpro.nl] and URI [/]

A short analysis:
cat magnolia.log | grep 'Try to match' | cut d -f2,3,4,5 | sort | uniq -c
1 - Try to match domain [null] and URI [/]
45421 - Try to match domain [redactie-live.test.vpro.nl] and URI [/]
1 - Try to match domain [redactie-live.test.vpro.nl] and URI [/.resources/controls/navigation/mgnlNavigationBgHover.gif]

Comment by Rico Jansen [ 02/Mar/12 ]

I have traced the issue back to SimpleUrlPattern in the Magnolia core, this has been changed
so it calls getAssignedSite on match operations. This apparently happens a lot in the FilterChain.

I have implemented a simple LRU cache on the call, it keys on domain+uri. This has given us a big
performance boost in Admincentral.

Installed on a public node, this also shows quite a lot of calls winding up in the cache, not as much
as a author node though.
Author:
Try to match domain [redactie-test.vpro.omroep.nl] and URI [/.imaging/stk/hollanddoc/interne-teaser-small/media/hollanddoc/2010/July/armoede/original/armoede.jpg] : Hit/Miss : 13008113/1200
Public:
Try to match domain [geschiedenis-test.vpro.omroep.nl] and URI [/quiz/stripquiz.html] : Hit/Miss : 395592/45

Attached are the diffs to the code of 1.4.4

Comment by Jan Haderka [ 02/Mar/12 ]

Excellent. Thanks a lot Rico.

Comment by Rico Jansen [ 02/Mar/12 ]

You're welcome,
It still bothers me a bit that it gets called that often, but finding out who calls it how often is quite the excercise. It comes from the filters in the FilterChain that much I have found out, SimpleUrlPattern gets used a lot there I do know that as well.

Comment by Boris Kraft [ 02/Mar/12 ]

shouldn't this issue be reopened given the new evidence? Or new issue be created?

Comment by Rico Jansen [ 02/Mar/12 ]

Well technically this issue has been resolved. It is more that the performance suffers from the combination of this issue and issue: http://jira.magnolia-cms.com/browse/MAGNOLIA-3915
The changes in this one means that the call takes a bit longer, but the changes in SimpleUrlMatcher means it gets called lots more often then in the past.

I have made issue : http://jira.magnolia-cms.com/browse/MGNLETK-72 which references those two.

Comment by Rico Jansen [ 05/Mar/12 ]

It seems there is a small bug in the diff I posted, in the setSize call I refrain to wrap the LRUMap
in a synchronized collection. This is necessary as LRUMap is not threadsafe.

BTW The code is working nicely in production here.

Generated at Mon Feb 12 01:48:07 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.