[MGNLEESOLR-64] FacetedSolrSearchProvider#removeOutdatedIndexes performs really slow for us resulting in dramatic search performance Created: 28/May/15  Updated: 03/Dec/20  Resolved: 18/Aug/15

Status: Closed
Project: Solr Search Provider
Component/s: None
Affects Version/s: 2.2
Fix Version/s: 3.0

Type: Bug Priority: Blocker
Reporter: Edgar Vonk Assignee: Roman Kovařík
Resolution: Fixed Votes: 0
Labels: backlog541
Remaining Estimate: 3d
Time Spent: Not Specified
Original Estimate: 3d
Environment:

Solr 4.10, MacOS, Linux


Attachments: PNG File Screen Shot 2015-05-28 at 23.20.38.png    
Issue Links:
causality
dependency
depends upon MGNLEESOLR-65 FacetedSolrSearchProvider should not ... Closed
relation
is related to MGNLEESOLR-77 Introduce possibility to connect craw... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:
Visible to:
Michaël van der Mark
Sprint: Sprint 6 (Kromeriz)
Story Points: 2

 Description   

Hi,

Since Magnolia Solr 2.2 the performance of our search functionality has becomes really slow (searches often take > 10 seconds and sometimes much more than that). Debugging reveals that the culprit is the new FacetedSolrSearchProvider#removeOutdatedIndexes method. For some queries this method takes an enormous amount of time. The Solr search query itself is always really fast.

Our index is really quite small (443 documents, text only) and we use the default Magnolia Solr Module schema.xml and we use a crawler (not an indexer) in the content-indexer module.

I am not sure what the issue is but for now I would really like the ability to disable the removeOutdatedIndexes functionality altogether. Why would you want to do this for every single query in any case? Can you as a start please make this functionality optional in the configuration of the Magnolia Solr Module?

As a workaround for this issue we are now forced to write our own FacetedSolrSearchProvider but this is really painful since most of the methods in this class are private and therefore cannot be overridden.. We pretty much have to copy the entire class with is not something we want to do. Can you please make all methods protected at least so that we can extend this and other classes in the module?



 Comments   
Comment by Edgar Vonk [ 28/May/15 ]

I think the issue is even worse than just a performance issue. If I am not mistaken the removeOutdatedIndexes method removes a lot of valid documents from our index..

And in the proces we get a lot of Socket closed exceptions in the log. Not sure what is going on.

2015-05-28 14:58:19,537 DEBUG provider.logic.providers.FacetedSolrSearchProvider: Deleting index for http://localhost:8081/prague/programme/programme-activities/activity/233.html.
2015-05-28 14:58:19,577 ERROR lia.search.solrsearchprovider.logic.util.SolrUtils: Socket closed
java.net.SocketException: Socket closed
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at java.net.Socket.connect(Socket.java:538)
	at java.net.Socket.<init>(Socket.java:434)
	at java.net.Socket.<init>(Socket.java:286)
	at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
	at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
	at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at info.magnolia.search.solrsearchprovider.logic.util.SolrUtils.getHttpStatusCode(SolrUtils.java:40)
	at info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.removeOutdatedIndexes(FacetedSolrSearchProvider.java:459)
	at info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.performFacettedSearch(FacetedSolrSearchProvider.java:340)
	at info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.search(FacetedSolrSearchProvider.java:152)
	at info.magnolia.search.solrsearchprovider.logic.model.FacetedSearchResultModel.execute(FacetedSearchResultModel.java:108)
	at nl.info.eaie.magnolia.model.EaieFacetedSearchResultModel.execute(EaieFacetedSearchResultModel.java:52)
Comment by Edgar Vonk [ 28/May/15 ]

I think the last issue I mentioned might have to do with the fact that we use 'dynamic pages' in our Magnolia implementation: a single page in Magnolia that is mapped using a virtual URI mapping to multiple URLs (where the last part of the URL is mapped to a request parameter internally).

Could it be that the removeOutdatedIndexes method assumes that the Solr index was made using an 'indexer' (in the content-indexer module) and not a crawler? I.e. that Magnolia JCR content (Magnolia web pages) was used to create the index instead of real web content (normal web pages). I have not checked this so just an idea..

Comment by Edgar Vonk [ 28/May/15 ]

But in any case I think it is a very bad idea to try to clean the index for every search query because with a large index this will undoubtedly reduce the search performance drastically.

Comment by Edgar Vonk [ 28/May/15 ]

I made a workaround for this issue by implementing our own FacetedSolrSearchProvider. As I expected it turned out to be impossible to properly extend from Magnolia's FacetedSolrSearchProvider since pretty much every method that we need in our subclass is private. Even the final static fields are constant (why??).. Sorry to push this point but to us it almost feels as if Magnolia sometimes does not want people to extend their Java classes..

Anyway, our FacetedSolrSearchProvider is identical to Magnolia's with the exception that we do not invoke the removeOutdatedIndexes method in #performFacettedSearch. As expected this dramatically improves the performance of our Solr search functionality which is now back to normal (as in: before version 2.2 of the Magnolia Solr Module).

Comment by Edgar Vonk [ 28/May/15 ]

Attached screenshot of the mean response time of our Gatling load & peformance tests which include search queries using the Magnolia Solr Module. The last build includes my fix and you see the dramatic performance gain.

Comment by Edgar Vonk [ 19/Aug/15 ]

Hi Milan, I am testing this on the latest 3.0-SNAPSHOT version but I don't it being fixed? Or maybe I don't understand the fix.

I still see the very slow performance and because of the calls to info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.removeOutdatedIndexes for every search. Besides the slow performance another issue with this is that the code assumes that the public Magnolia instance is always running which is not the case. When we develop locally we typically only run the author instance. Searching now fails because in the info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.removeOutdatedIndexes method the code attempts to connect to the public instance which fails and generates errors:

2015-08-19 09:29:11,155 ERROR lia.search.solrsearchprovider.logic.util.SolrUtils: Connect to localhost:8081 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused
org.apache.http.conn.HttpHostConnectException: Connect to localhost:8081 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused
	at org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:140)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:318)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:363)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:219)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
	at info.magnolia.search.solrsearchprovider.logic.util.SolrUtils.getHttpStatusCode(SolrUtils.java:39)
	at info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.removeOutdatedIndexes(FacetedSolrSearchProvider.java:443)
	at info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.performFacettedSearch(FacetedSolrSearchProvider.java:325)
	at info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.search(FacetedSolrSearchProvider.java:138)
	at info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.search(FacetedSolrSearchProvider.java:120)
Comment by Edgar Vonk [ 19/Aug/15 ]

And also: would it be possible to change all the private methods in FacetedSolrSearchProvider to protected so that we can much more easily extend this class? Now we need to copy pretty much the entire class because everything is private. Even the public methods we cannot override because they all use private methods which cannot be accesses from our class. Making everything protected would fix this.

Comment by Milan Divilek [ 19/Aug/15 ]

Hi Edgar,

Hi Milan, I am testing this on the latest 3.0-SNAPSHOT version but I don't it being fixed? Or maybe I don't understand the fix. I still see the very slow performance and because of the calls to info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.removeOutdatedIndexes for every search.

Ticket is marked as resolve, but the fix is committed just to feature branch (MGNLEESOLR-64), it means fix is still not in latest 3.0-SNAPSHOT. First somebody from our team has to do review of the fix, after that I'll integrate it to master branch which will trigger job on our jenkins server to build new 3.0-SNAPSHOT version with the fix.

Besides the slow performance another issue with this is that the code assumes that the public Magnolia instance is always running which is not the case. When we develop locally we typically only run the author instance. Searching now fails because in the info.magnolia.search.solrsearchprovider.logic.providers.FacetedSolrSearchProvider.removeOutdatedIndexes method the code attempts to connect to the public instance which fails and generates errors

Fix just removes FacetedSolrSearchProvider#removeOutdatedIndexes method as it was not good idea to do removing outdated indexes on every single search.
I'll add new possibly to connect crawlers with activation process (MGNLEESOLR-77). There will be two options how to trigger crawler by schduler (what is used now) and also by triggering the activation. So if you manage your sites in magnolia (you are not indexing external site) then this should help with keep indexes up to date.

Comment by Edgar Vonk [ 19/Aug/15 ]

Ah, thanks Milan! All clear now. We will wait until the fix has been reviewed and merged to the master branch.

Generated at Mon Feb 12 10:59:44 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.