Loading...

XML

Word

Printable

Type: Improvement
Resolution: Fixed
Priority: Neutral
Fix Version/s: 3.0
Affects Version/s: None
Labels:
- support
Environment:
Windows 7

Template:
Patch included:

Yes
Acceptance criteria:

Empty

show more show less
Task DoD:

show more show less
Sprint:
Sprint 7 (Kromeriz)
Story Points:
2

In current implementation of magnolia-solr-search-provider there is no check for value of "robots" meta tag of page. That causes indexing of all found pages, even if "noindex" value is set to robots meta tag. Problem is on crawler4j side, because it does not respect this flag. Issue is reported on their issue tracker (https://code.google.com/p/crawler4j/issues/detail?id=59) since 2011 and still exists. Possible option is to modify MgnlCrawler's visit(Page p) method to check flag value from parsed content and don't index it in solr if "noindex" flag exists. Possible solution implemented is attached (lines 108-112).

Acceptance criteria

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

MgnlCrawler.java
7 kB
17/Jul/15 10:45 AM

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(2 mentioned in)

Assignee:: Milan Divilek

Reporter:: Mariusz Chruscielewski

Visible to:: Edgar Vonk

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 17/Jul/15 10:45 AM

Updated:: 03/Dec/20 5:52 PM

Resolved:: 26/Aug/15 3:54 PM

Date of First Response:: 27/Aug/15 9:29 AM

Task DoD

Details

Description

Checklists

Attachments

Attachments

Issue Links

Activity

People

Dates

Checklists