[MGNLPER-130] ConcurrentModificationException after login (probl. obsolete?) Created: 19/Mar/20  Updated: 14/Mar/22  Resolved: 14/Mar/22

Status: Closed
Project: Periscope
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Neutral
Reporter: Michael Duerig Assignee: Michael Duerig
Resolution: Duplicate Votes: 0
Labels: artt, foundation_team, to-verify
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to MLEARN-14 Performance load tests sometimes thro... Closed
relates to MLEARN-17 Concurrency issues in NN ranker Closed
duplicate
duplicates MLEARN-17 Concurrency issues in NN ranker Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Epic Link: Result Ranking Tech Issues

 Description   

Steps to reproduce

Login and out a couple (of dozen) times alternating between the users eric, peter and tina. Eventually ConcurrentModificationException will be logged:

2020-03-19 10:56:48,989 ERROR gnolia.admincentral.findbar.search.ResultCollector: An error occurred during the search process, therefore an empty collection will be returned.
java.util.concurrent.CompletionException: java.lang.RuntimeException: java.util.ConcurrentModificationException
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) ~[?:1.8.0_222]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) ~[?:1.8.0_222]
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592) ~[?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_222]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_222]
Caused by: java.lang.RuntimeException: java.util.ConcurrentModificationException
	at info.magnolia.periscope.Periscope.lambda$search$1(Periscope.java:125) ~[magnolia-periscope-core-1.2-SNAPSHOT.jar:?]
	at info.magnolia.periscope.search.SearchRunner.lambda$execute$0(SearchRunner.java:85) ~[magnolia-periscope-core-1.2-SNAPSHOT.jar:?]
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_222]
	... 3 more
Caused by: java.util.ConcurrentModificationException
	at org.apache.commons.collections4.map.AbstractLinkedMap$LinkIterator.nextEntry(AbstractLinkedMap.java:574) ~[commons-collections4-4.4.jar:4.4]
	at org.apache.commons.collections4.map.AbstractLinkedMap$KeySetIterator.next(AbstractLinkedMap.java:469) ~[commons-collections4-4.4.jar:4.4]
	at java.util.AbstractCollection.toArray(AbstractCollection.java:141) ~[?:1.8.0_222]
	at java.util.ArrayList.<init>(ArrayList.java:178) ~[?:1.8.0_222]
	at info.magnolia.periscope.rank.ml.IndexedBuffer.asList(IndexedBuffer.java:97) ~[magnolia-periscope-result-ranker-1.2-SNAPSHOT.jar:?]
	at info.magnolia.periscope.rank.ml.NeuralNetworkResultRanker.outputArrayToResults(NeuralNetworkResultRanker.java:193) ~[magnolia-periscope-result-ranker-1.2-SNAPSHOT.jar:?]
	at info.magnolia.periscope.rank.ml.NeuralNetworkResultRanker.rank(NeuralNetworkResultRanker.java:143) ~[magnolia-periscope-result-ranker-1.2-SNAPSHOT.jar:?]
	at info.magnolia.periscope.Periscope.fetchSupplierAwareSearchResults(Periscope.java:143) ~[magnolia-periscope-core-1.2-SNAPSHOT.jar:?]
	at info.magnolia.periscope.Periscope.lambda$null$0(Periscope.java:123) ~[magnolia-periscope-core-1.2-SNAPSHOT.jar:?]
	at info.magnolia.context.AsynchronousContext$OperationFactory.lambda$wrap$0(AsynchronousContext.java:122) ~[magnolia-core-6.2-SNAPSHOT.jar:?]
	at info.magnolia.periscope.Periscope.lambda$search$1(Periscope.java:123) ~[magnolia-periscope-core-1.2-SNAPSHOT.jar:?]
	at info.magnolia.periscope.search.SearchRunner.lambda$execute$0(SearchRunner.java:85) ~[magnolia-periscope-core-1.2-SNAPSHOT.jar:?]
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_222]
	... 3 more 

Initial analysis

Looks like this is a regression introduced with MGNLPER-121.
Namely the result ranker cache introduced there causes NeuralNetworkResultRanker instances to be shared between multiple threads. This in turn causes concurrent calls to mutators of IndexedBuffer , which uses the not thread safe LRUMap .

 



 Comments   
Comment by Michael Duerig [ 20/Mar/20 ]

Looks like this is a deeper problem. MGNLPER-52 introduced asynchronous execution of search result suppliers, which causes the neural network result ranker to be used concurrently from different threads. However, that result ranker is not thread safe. Re. the concurrent modification exception, this is caused by concurrent access to the IndexedBuffer of the result ranker. This could relatively easily fixed by making IndexedBuffer thread safe. This still leaves us with the questions about the thread safety of other parts of the result ranker. E.g. concurrent access to MultiLayerNetwork, which AFAIK is also not thread safe. A radical approach would be to just synchronize all public methods of the result ranker. This begs the question however how much this would undo the benefits from asynchronously executing the search suppliers in the first place.

To move forward I suggest we synchronize all API methods of the neural network ranker and measure the performance impact (mmichel, could you help with this bit?). If the impact is not acceptable we have to go back to the drawing board.

/cc ilgun

 

Comment by Michael Duerig [ 14/Mar/22 ]

Duplicate of MLEARN-17

Generated at Mon Feb 12 10:29:01 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.