[MGNLPER-143] Out of memory still occurring as of 6.2.2 Created: 07/Aug/20 Updated: 04/May/22 Resolved: 04/May/22 |
|
| Status: | Closed |
| Project: | Periscope |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Espen Jervidalo | Assignee: | Unassigned |
| Resolution: | Obsolete | Votes: | 0 |
| Labels: | maintenance, tech-debt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Template: |
|
||||||||||||||||||||||||||||||||
| Acceptance criteria: |
Empty
|
||||||||||||||||||||||||||||||||
| Task DoD: |
[ ]*
Doc/release notes changes? Comment present?
[ ]*
Downstream builds green?
[ ]*
Solution information and context easily available?
[ ]*
Tests
[ ]*
FixVersion filled and not yet released
[ ] 
Architecture Decision Record (ADR)
|
||||||||||||||||||||||||||||||||
| Bug DoR: |
[ ]*
Steps to reproduce, expected, and actual results filled
[ ]*
Affected version filled
|
||||||||||||||||||||||||||||||||
| Date of First Response: | |||||||||||||||||||||||||||||||||
| Epic Link: | Result Ranking Tech Issues | ||||||||||||||||||||||||||||||||
| Description |
|
We had to disable the machine learning-based ranker for periscope on the Trials on version 6.2.2. After a lot of research and try-and-error with the suggested 6.2.2 compatible memory settings it appears that you can not get this working properly. Reducing and Incrementing the off-heap memory reserved by the library just makes the problem appear sooner or later. But in the end, it will fail. See stack trace in the linked Freetrials ticket. I would recommend replacing the machine learning ranker by something less error-prone. There's a related ticket: |
| Comments |
| Comment by Espen Jervidalo [ 07/Aug/20 ] |
|
On the trials there is no swap. This might be a factor that makes this appear more easily. |
| Comment by Andres Garcia [ 07/Aug/20 ] |
Findings:With ML-enabled result ranker enabled on trials, when customers complete a trial signup, they'll use a single user with superuser permissions to access to their public and author instances. Then they'll use the Periscope/Find Bar to perform searches, selecting a result, we noticed, increased the off-heap memory by a significant amount (~30MB). Increasing the amount of memory assigned to -Dorg.bytedeco.javacpp.maxbytes only defers the crash since with more memory it will just allow more searches/results before failing again. The memory leakage happens not only on trials scenarios, is also confirmed to be happening to current cloud customers instances. In that scenario it's possible that the instances will start swapping (note: swapping is disabled on trials, hence the OOM), or are being rebooted before the issue happens; or, also likely, the different users logging in to the instance in will somehow produce a release of the memory (this part we didn't had a look on further). Reducing the outputUnits of the ranker doesn't solve the issue too since it's the learning process that's producing the issue. Besides the above problem the periscope result ranker was producing many log errors when trying to rank results (see bellow), polluting the log records.
2020-08-06 13:08:50,780 ERROR gnolia.periscope.rank.ml.NeuralNetworkResultRanker: Failed to train ranking neural network java.lang.IllegalArgumentException: Unknown result with identifier: XXXXXXXXXXXX
|
| Comment by Espen Jervidalo [ 10/Aug/20 ] |
|
See linked ticket. |
| Comment by Laura Delnevo [ 04/May/22 ] |
|
Marked as Obsolete, following the unbundling of the Periscope Ranking module. Further feedback on the Search functionality, its performance, and UI improvements will be taken into consideration by the Magnolia team as part of a broader initiative around "Find relevant content fast within Magnolia". Submit your feedback to us and we'll be in touch. |