[MGNLEESOLR-61] Ability to implement own crawler implementation Created: 08/May/15 Updated: 03/Dec/20 Resolved: 25/Aug/15 |
|
| Status: | Closed |
| Project: | Solr Search Provider |
| Component/s: | None |
| Affects Version/s: | 2.1.1 |
| Fix Version/s: | 3.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Michaƫl van der Mark | Assignee: | Milan Divilek |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | maintenance, quickwin | ||
| Remaining Estimate: | 0d | ||
| Time Spent: | 10m | ||
| Original Estimate: | Not Specified | ||
| Template: |
|
| Acceptance criteria: |
Empty
|
| Task DoD: |
[ ]*
Doc/release notes changes? Comment present?
[ ]*
Downstream builds green?
[ ]*
Solution information and context easily available?
[ ]*
Tests
[ ]*
FixVersion filled and not yet released
[ ] 
Architecture Decision Record (ADR)
|
| Date of First Response: | |
| Sprint: | Sprint 7 (Kromeriz) |
| Story Points: | 2 |
| Description |
|
As a developer I want the possibility to implement my own webcrawler in magnolia. With an own crawler we want to implement some logic to make it possible to exclude some pages from being indexed by Solr. Magnolia implemented their own crawler (MgnlCrawler.java). This crawler is executed in the following command 'CrawlerIndexerCommand'. This command can be changed in the Magnolia configuration. What we tried so far:
@Override
public void start(ModuleLifecycleContext moduleLifecycleContext) {
dataIndexerFactory.init();
crawlerIndexerFactory.init();
}
@Override
public void stop(ModuleLifecycleContext moduleLifecycleContext) {
dataIndexerFactory.cleanup();
crawlerIndexerFactory.cleanup();
}
Possible solutions: Point four is a nice to have feature in the future. |
| Comments |
| Comment by Edgar Vonk [ 14/Jul/15 ] |
|
Any news on this maybe? It is quite cumbersome because we really want to write our own crawler class. But the MgnlCrawler class is very much hardcoded in the module. E,.g. in CrawlerIndexerCommand: controller.start(MgnlCrawler.class, config.getNbrCrawlers()); Ideally we would like to be able to configure the crawler class in the module meta-inf configuration. Something like: <components>
<id>main</id>
<component>
<type>info.magnolia.module.indexer.crawler.MgnlCrawler</type>
<implementation>org.OurCustomCrawler</implementation>
</component>
</components>
|
| Comment by Milan Divilek [ 03/Aug/15 ] |
|
Hello Edgar, ticket is planned for version 3.0. I'll have a look on it asap. |