As a developer I want the possibility to implement my own webcrawler in magnolia. With an own crawler we want to implement some logic to make it possible to exclude some pages from being indexed by Solr.
Magnolia implemented their own crawler (MgnlCrawler.java). This crawler is executed in the following command 'CrawlerIndexerCommand'. This command can be changed in the Magnolia configuration.
What we tried so far:
Implementend our own command (almost same code as 'CrawlerIndexerCommand' except our own crawler is called by the controller) and added factories and indexer and crawler maps to our Module class. This is copying of code and not the way to do this in Java.
1 Making the crawler implemention configurable in Magnolia.
2 Extending 'MgnlCrawler' we would like to reuse methods like treatFieldMappings(), getIndexService. Now these methods are private and we only want to add some additions to the shouldVisit() and visit() methods.
3 Extending 'CrawlerIndexerCommand' however the contentIndexerModule is private.
4 An app in Magnolia to manage exclusion and other Solr configuration.
Point four is a nice to have feature in the future.