[MGNLPER-143] Out of memory still occurring as of 6.2.2 Created: 07/Aug/20  Updated: 04/May/22  Resolved: 04/May/22

Status: Closed
Project: Periscope
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Espen Jervidalo Assignee: Unassigned
Resolution: Obsolete Votes: 0
Labels: maintenance, tech-debt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to MGNLPER-82 Consider non-AI alternatives for sear... Closed
relates to MGNLPER-135 Long running searches cause performan... Closed
relates to MGNLPER-133 Result ranker crashes at login Closed
causality
relation
supersession
supersedes MGNLPER-70 Physical memory usage is too high Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:
Epic Link: Result Ranking Tech Issues

 Description   

We had to disable the machine learning-based ranker for periscope on the Trials on version 6.2.2.

After a lot of research and try-and-error with the suggested 6.2.2 compatible memory settings it appears that you can not get this working properly.

Reducing and Incrementing the off-heap memory reserved by the library just makes the problem appear sooner or later. But in the end, it will fail.

See stack trace in the linked Freetrials ticket.

I would recommend replacing the machine learning ranker by something less error-prone. There's a related ticket: MGNLPER-82.



 Comments   
Comment by Espen Jervidalo [ 07/Aug/20 ]

On the trials there is no swap. This might be a factor that makes this appear more easily.

Comment by Andres Garcia [ 07/Aug/20 ]

Findings:

With ML-enabled result ranker enabled on trials, when customers complete a trial signup, they'll use a single user with superuser permissions to access to their public and author instances.

Then they'll use the Periscope/Find Bar to perform searches, selecting a result, we noticed, increased the off-heap memory by a significant amount (~30MB).
The issue is that the memory is, apparently never released during a single session, or different sessions with some hours of difference, eventually producing OutOfMemory errors on the instance which eventually render the magnolia UI unusable.

Increasing the amount of memory assigned to -Dorg.bytedeco.javacpp.maxbytes only defers the crash since with more memory it will just allow more searches/results before failing again.

The memory leakage happens not only on trials scenarios, is also confirmed to be happening to current cloud customers instances. In that scenario it's possible that the instances will start swapping (note: swapping is disabled on trials, hence the OOM), or are being rebooted before the issue happens; or, also likely, the different users logging in to the instance in will somehow produce a release of the memory (this part we didn't had a look on further).

Reducing the outputUnits of the ranker doesn't solve the issue too since it's the learning process that's producing the issue.

Besides the above problem the periscope result ranker was producing many log errors when trying to rank results (see bellow), polluting the log records.
These errors seems to be appearing only clicking on light dev assets that appear on the results.
For every clicked results, a new stack trace will be written in the log. The error is the following:

 

2020-08-06 13:08:50,780 ERROR gnolia.periscope.rank.ml.NeuralNetworkResultRanker: Failed to train ranking neural network
 java.lang.IllegalArgumentException: Unknown result with identifier: XXXXXXXXXXXX

 

 

 

 

Comment by Espen Jervidalo [ 10/Aug/20 ]

See linked ticket.

Comment by Laura Delnevo [ 04/May/22 ]

Marked as Obsolete, following the unbundling of the Periscope Ranking module. 

Further feedback on the Search functionality, its performance, and UI improvements will be taken into consideration by the Magnolia team as part of a broader initiative around "Find relevant content fast within Magnolia". Submit your feedback to us and we'll be in touch. 

Generated at Mon Feb 12 10:29:09 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.