[MLEARN-17] Concurrency issues in NN ranker Created: 19/Jun/19  Updated: 14/Mar/22  Resolved: 21/Jul/20

Status: Closed
Project: Machine Learning
Component/s: None
Affects Version/s: None
Fix Version/s: 1.2.1

Type: Bug Priority: Neutral
Reporter: Aleksandr Pchelintcev Assignee: Chuong Doan Huy
Resolution: Fixed Votes: 0
Labels: maintenance
Remaining Estimate: Not Specified
Time Spent: 1d 1.5h
Original Estimate: Not Specified

Issue Links:
Relates
relates to MLEARN-14 Performance load tests sometimes thro... Closed
relates to MGNLPER-130 ConcurrentModificationException after... Closed
duplicate
is duplicated by MGNLPER-130 ConcurrentModificationException after... Closed
relation
is related to MGNLPER-138 Find bar not working after changing o... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Release notes required:
Yes
Epic Link: Periscope improvements
Sprint: Maintenance 15, Maintenance 16
Story Points: 5

 Description   

It appears that NN result ranker suffers from an intermittent concurrency issue. The problem seems to be related to the simultaneous unsynched r/w access to the NeuralNetworkResultRanker#resultTexts (see the stacktrace below).

After some code review it can be cocnluded that in situation when there are two different browser sessions authenticated with the same account (e.g. superuser), it is going to be the case that NeuralNetworkResultRanker instance will be shared between them (since the corresponding ranker factory caches the instances by user).

Even though it was hard to detect the issue during manual testing and simulations, it does appear in e.g. load tests and even sometimes during development.

All the API that let's to interact with NN result ranker needs to be synchronised.


2019-06-10 13:21:48,899 ERROR gnolia.admincentral.findbar.search.ResultCollector: An error occurred during the search process, therefore an empty collection will be returned.
java.util.concurrent.CompletionException: java.util.ConcurrentModificationException
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) ~[?:1.8.0_144]
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) ~[?:1.8.0_144]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592) ~[?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_144]
Caused by: java.util.ConcurrentModificationException
        at org.apache.commons.collections4.map.AbstractLinkedMap$LinkIterator.nextEntry(AbstractLinkedMap.java:552) ~[commons-collections4-4.1.jar:4.1]
        at org.apache.commons.collections4.map.AbstractLinkedMap$KeySetIterator.next(AbstractLinkedMap.java:450) ~[commons-collections4-4.1.jar:4.1]
        at java.util.AbstractCollection.toArray(AbstractCollection.java:141) ~[?:1.8.0_144]
        at java.util.ArrayList.<init>(ArrayList.java:177) ~[?:1.8.0_144]
        at info.magnolia.periscope.rank.ml.IndexedBuffer.asList(IndexedBuffer.java:97) ~[magnolia-periscope-result-ranker-1.1-SNAPSHOT.jar:?]
        at info.magnolia.periscope.rank.ml.NeuralNetworkResultRanker.outputArrayToResults(NeuralNetworkResultRanker.java:192) ~[magnolia-periscope-result-ranker-1.1-SNAPSHOT.jar:?]
        at info.magnolia.periscope.rank.ml.NeuralNetworkResultRanker.rank(NeuralNetworkResultRanker.java:142) ~[magnolia-periscope-result-ranker-1.1-SNAPSHOT.jar:?]
        at info.magnolia.periscope.Periscope.fetchSupplierAwareSearchResults(Periscope.java:135) ~[magnolia-periscope-core-1.1-SNAPSHOT.jar:?]
        at info.magnolia.periscope.Periscope.lambda$search$0(Periscope.java:113) ~[magnolia-periscope-core-1.1-SNAPSHOT.jar:?]
        at info.magnolia.periscope.search.SearchRunner.lambda$execute$0(SearchRunner.java:85) ~[magnolia-periscope-core-1.1-SNAPSHOT.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_144]
        ... 3 more

Generated at Mon Feb 12 02:29:06 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.