[MGNLEESOLR-45] Changes in the solrConfig are not reloaded dynamically in the commands Created: 11/Feb/15  Updated: 08/Jul/15  Resolved: 08/Jul/15

Status: Closed
Project: Solr Search Provider
Component/s: None
Affects Version/s: 2.0
Fix Version/s: None

Type: Improvement Priority: Neutral
Reporter: Edgar Vonk Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2015-03-16 at 16.29.02.png    
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:

 Description   

Maybe by design but it would be nice of any changes made to the solrConfig in the solr-search-provider are reloaded dynamically. Now I notice that you need to restart Magnolia in order for the changes to apply. Other Magnolia modules support reloading of configuration at runtime so this should be possible?

E.g.:

  1. change the baseURL
  2. wait for the scheduled crawler command to run again (or run it by hand)
  3. it still uses the old version of the baseURL. the new change is not picked up


 Comments   
Comment by Edgar Vonk [ 11/Feb/15 ]

Ah, I think the issue is in the crawler (and indexer?) commands. The module's configuration is reloaded just fine I think. However the crawler command keeps on using the old configuration.

Comment by Milan Divilek [ 05/Mar/15 ]

Hello Edgar,

I'm unable to reproduce the issue. I followed steps how to reproduce it, but it works without any problems. Crawlers (also Indexers) uses the newly defined baseUrl. I also tried debug the code and seems that the new baseUrl is used everywhere. Can you please provide more information about the behavior and how to reproduce it. Thanks

Regards
Milan

Comment by Edgar Vonk [ 16/Mar/15 ]

Hi @milan,

Sorry for the late response.

I just tested it again and I still see issues. However, they are a little different to what I reported...

The cronjob scenario is:

  1. Start up Magnolia using our default crawler configuration. See screenshot.
  2. The crawler runs fine. 18 pages are indexed.
  3. Change the 'url' parameter (not baseUrl btw. what is baseUrl?) of our EAIE website crawler into a different URL. E.g. to https://www.info.nl/.
  4. The next time the crawler runs it does not index anything but instead logs the error:
    2015-03-16 16:39:00,408 ERROR edu.uci.ics.crawler4j.fetcher.PageFetcher         : Fatal transport error: null while fetching https://www.info.nl/ (link found in doc #0)

    But I see now that this is a different issue having to do with HTTPS. I guess I need to add the SSL certificate to my local JVM. I think it would be good to add this to the Solr Module documentation page? The error message 'null' is of course not very informative. Maybe something can be improved here?

  5. Change the 'url' parameter back to:
    http://localhost:8081/

    .

  6. Now the next time the crawler runs it does index all our 18 pages again so all is ok. Except this time it does log the following error which it did not do on the first run:
    2015-03-16 16:42:00,008 ERROR edu.uci.ics.crawler4j.fetcher.PageFetcher         : Fatal transport error: Connection refused while fetching http://localhost/robots.txt (link found in doc #0)

    . No idea where this URL comes from? Strange.. But not a big issue in itself.

Regarding the 'run by hand' scenario: what I now have discovered is that the the Groovy App which we use to run the crawler commands from only picks changes I made in the crawler command when I close the App and then reopen it. I guess the Groovy App when it runs a Groovy script loads the command in question into memory but it not aware that the command is an observable class that can change at runtime and should be reloaded then? Just something to be aware of perhaps. I would have expected that the Groovy scripts would (re)create all required Java classes such as commands every time the script is run but apparently not?

Our Groovy script in question:

cm = info.magnolia.commands.CommandsManager.getInstance()
command = cm.getCommand("content-indexer", "crawler") 
ctx.setAttribute("crawlerConfig", "eaie_website", info.magnolia.context.Context.SESSION_SCOPE);
command.execute(ctx) 
Comment by Milan Divilek [ 23/Mar/15 ]

This is weird. I'm note able to reproduce the issue with 'run by hand' scenario. I followed steps how to reproduce it and it works without any problems without closing/reopening groovy app. What versions of Magnolia, Magnolia UI and Groovy are you using?
My tests were done on Magnolia 5.3.5, UI 5.3.5, Groovy 2.3., M solr 2.0 and on Magnolia 5.3.6, UI 5.3.6, Groovy 2.3.2, M solr 2.1.1.

(not baseUrl btw. what is baseUrl?)

First I thought that by baseUrl you mean solr server url (/modules/solr-search-provider/config/solrConfig@baseURL). Now I understand that's url of Crawler's site.

The next time the crawler runs it does not index anything but instead logs the error:

2015-03-16 16:39:00,408 ERROR edu.uci.ics.crawler4j.fetcher.PageFetcher         : Fatal transport error: null while fetching https://www.info.nl/ (link found in doc #0)

This is error in old crawler4j-3.4 module. We updated to the latest crawler4j (4.1) library where the issue is fixed in Magnolia-content-indexer 2.1 . Please update to the latest magnolia solr and content-indexer version 2.1.1.

Comment by Edgar Vonk [ 08/May/15 ]

I cannot seem to reproduce this issue anymore either so please code this issue.

Generated at Mon Feb 12 10:59:33 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.