[MGNLEE-631] Consolidate past investigations related to recurring memory issues relative to the ranker Created: 26/Nov/20  Updated: 04/May/22  Resolved: 04/May/22

Status: Closed
Project: Magnolia DX Core
Component/s: build / bundling
Affects Version/s: 6.2.4
Fix Version/s: None

Type: Task Priority: Major
Reporter: Mikaël Geljić Assignee: Unassigned
Resolution: Obsolete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2020-11-26 at 10.43.50.png     PNG File Screenshot 2020-11-26 at 10.44.57.png    
Issue Links:
Relates
relates to MGNLPER-82 Consider non-AI alternatives for sear... Closed
relates to MGNLPER-152 Result Ranking Tech Issues Closed
relates to MGNLPER-154 Remove ranking from bundle Closed
Template:
Acceptance criteria:
Empty
Task DoR:
Empty
Release notes required:
Yes
Date of First Response:
Epic Link: Cloud product confidence

 Description   
  • Consolidate past/recent investigations
  • Conduct proper benchmarking, in order to have a final and proper statement about memory issues linked to the usage of periscope-ranker.

Now, to move forward, I propose to benchmark this properly, in similar spirit as Duy's benchmarks re: JVM/GC options lately, or Maxime's past efforts Running Performance Load Tests:

  1. Define a clear activity scenario, likely triggering few to many find-bar searches, and document the target setup specifications (e.g. local docker with low-memory setup, cloud environment).
  2. Define which setups we put under test, proposing 1. enabled-no-flags, 2. enabled-with-flags, 3. disabled, 4. excluded.
  3. Define which metrics we're interested in: likely jvm heap + non-heap or overall committed memory, others?
  4. Produce charts for each setup for the activity scenario, be it via Datadog (cloud), or ad-hoc tooling / local grafana setup.
  5. Consider automation, whether with Test Framework, or crafting a mini-API in front of periscope if that helps for load-testing tools (instead of scripting the infamous Vaadin UIDL calls).

Let's use this as a guinea pig example to establish a proper benchmarking methodology, so the experience is profitable. We will need to establish such tests instead of discovering impact of changes (be it from core/jvm setup/other) in production.

Expected outcome: clear statement from Product development: Keep/Patch/Disable/Remove, and aligned decision on Core/Cloud.



 Comments   
Comment by Espen Jervidalo [ 26/Nov/20 ]

I was asked to collect all the information we have collected. It's been so many small ad-hoc meetings and discussions, leading up to things like this:

https://git.magnolia-cms.com/projects/ODC/repos/swissre-webpresence/browse/light-modules/swissre-web/decorations/periscope-core/config.yaml

Where we just went into the swissre slack and told them what to apply. And they did.

Minh Nguyen authored a01a658a3f1 05 Nov 2020

Now here's the effect of that on the memory metrics, the orange marks a warning on free memory:

And in the broader context:

Sept 28th

Thanh Pham 05:46
@here: the upgrading Live to 6.2.3 has been finished successfully. I will install 1.41 FE(v707) to Live

Comment by Maxime Michel [ 26/Nov/20 ]

For me it's obvious that running DL4J without any flag to keep its memory under control will lead to instance crashes, this can be reproduced locally in a Docker container with low memory settings. The situation with currently available flags and the change I contributed to javacpp is described here: https://documentation.magnolia-cms.com/display/INCL/_dl4j+performance+issues

Last I tried, using -Dorg.bytedeco.javacpp.maxbytes=10% with either DL4J beta6 + javacpp pinned to 1.5.3 (recommended at the time by Samuel Audet, DL4J maintainer) or DL4J beta7 had the library manage its memory consumption by itself and prevent crashes, at least in my local testing environment.

Comment by Mikaël Geljić [ 20/Jan/21 ]

Alright, good input, let's assume this is all we'll ever collect here.

Now, to move forward, I propose to benchmark this properly, in similar spirit as Duy's benchmarks re: JVM/GC options lately:

  1. Define a clear activity scenario, likely triggering few to many find-bar searches, and document the target setup specifications (e.g. local docker with low-memory setup, cloud environment).
  2. Define which setups we put under test, proposing 1. enabled-no-flags, 2. enabled-with-flags, 3. disabled, 4. excluded.
  3. Define which metrics we're interested in: likely jvm heap + non-heap or overall committed memory, others?
  4. Produce charts for each setup for the activity scenario, be it via Datadog (cloud), or ad-hoc tooling / local grafana setup.
  5. Consider automation, whether with Test Framework, or crafting a mini-API in front of periscope if that helps for load-testing tools (instead of scripting the infamous Vaadin UIDL calls).

Let's use this as a guinea pig example to establish a proper benchmarking methodology, so the experience is profitable. We will need to establish such tests instead of discovering impact of changes (be it from core/jvm setup/other) in production.
 

Comment by David Lopez [ 26/Feb/21 ]

Here a use case from customer who brought some info related to this: https://magnolia-cms.slack.com/archives/CDG4DMWRM/p1614325710017100 

Comment by Mikaël Geljić [ 01/Mar/21 ]

Re: benchmarking setup, see mmichel's past efforts at https://git.magnolia-cms.com/projects/INTERNAL/repos/performance-loadtests/browse, as well as high-level wiki page Running Performance Load Tests, that's not outdated at all. Also featuring a FindBarMassSimulation.

Rephrasing the ticket description to reflect the setup and scenario definition stage we're at now.

Comment by Laura Delnevo [ 04/May/22 ]

Marked as Obsolete, following the unbundling of the Periscope Ranking module. 

Further feedback on the Search functionality, its performance, and UI improvements will be taken into consideration by the Magnolia team as part of a broader initiative around "Find relevant content fast within Magnolia". Submit your feedback to us and we'll be in touch. 

Generated at Mon Feb 12 05:31:50 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.