Uploaded image for project: 'Solr Search Provider'
  1. Solr Search Provider
  2. MGNLEESOLR-61

Ability to implement own crawler implementation

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • 3.0
    • 2.1.1
    • Sprint 7 (Kromeriz)
    • 2

      As a developer I want the possibility to implement my own webcrawler in magnolia. With an own crawler we want to implement some logic to make it possible to exclude some pages from being indexed by Solr.

      Magnolia implemented their own crawler (MgnlCrawler.java). This crawler is executed in the following command 'CrawlerIndexerCommand'. This command can be changed in the Magnolia configuration.

      What we tried so far:
      Implementend our own command (almost same code as 'CrawlerIndexerCommand' except our own crawler is called by the controller) and added factories and indexer and crawler maps to our Module class. This is copying of code and not the way to do this in Java.

              @Override
      	public void start(ModuleLifecycleContext moduleLifecycleContext) {
      		dataIndexerFactory.init();
      		crawlerIndexerFactory.init();
      	}
      
      	@Override
      	public void stop(ModuleLifecycleContext moduleLifecycleContext) {
      		dataIndexerFactory.cleanup();
      		crawlerIndexerFactory.cleanup();
      	}
      

      Possible solutions:
      1 Making the crawler implemention configurable in Magnolia.
      2 Extending 'MgnlCrawler' we would like to reuse methods like treatFieldMappings(), getIndexService. Now these methods are private and we only want to add some additions to the shouldVisit() and visit() methods.
      3 Extending 'CrawlerIndexerCommand' however the contentIndexerModule is private.
      4 An app in Magnolia to manage exclusion and other Solr configuration.

      Point four is a nice to have feature in the future.

        Acceptance criteria

              mdivilek Milan Divilek
              mvdmark Michaƫl van der Mark
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Task DoD

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0d
                    0d
                    Logged:
                    Time Spent - 10m
                    10m