[MGNLEESOLR-154] DOC: Solr module clean index command Created: 09/Feb/21  Updated: 08/Mar/21  Resolved: 08/Mar/21

Status: Closed
Project: Solr Search Provider
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Neutral
Reporter: Federico Grilli Assignee: Martin DrĂ¡pela
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File MGNLEESOLR-152_2ndRVEdit_2021-02-26 14.20.51 documentation.magnolia-cms.com a9c2b2d40faf.png     File config.modules.content-indexer_1_.yaml     PNG File image-2021-02-26-08-52-10-444.png    
Issue Links:
Cloners
is cloned by MGNLEESOLR-157 DOC: Port 5.7 doc update for Solr mod... Closed
documentation
documents MGNLEESOLR-152 Clean command should delete also page... Closed
Template:
Acceptance criteria:
Empty
Task DoR:
Empty
Date of First Response:
Sprint: CM & OC 23
Story Points: 2

 Description   

https://documentation.magnolia-cms.com/display/DOCS57/Indexing+and+crawling+a+website+with+Solr

By default crawler mechanism is connected(chained before the crawler command) with info.magnolia.search.solrsearchprovider.logic.commands.CleanSolrIndexCommand to clean index from outdated indexes(pages).

Configuration options:

  • max - maximum number of documents which will be checked - by default set to 500 - since 5.0.1
  • onlyHead - instead of fetching whole document only head is requested - default is false - if deleteNoIndex property is set to true, then this configuration is ignored, because robots meta tag can't be resolved from head request - since 5.5.1
  • followRedirects - if set to true, redirects are followed and the status code of finale page is evaluated - by default set to false - since 5.5.2
  • statusCodes - list of status codes, if page returns any of configured status codes then it will be removed from indexes. - by default it's empty, but 404 is every time considered to be removed - since 5.0.1
  • deleteNoIndex - if set to true than also pages with robots meta tag set to noindex will be removed from index - by default set to true - since 5.5.4
  • skipIfAlreadyRunning- if the clean command is already running for the crawler, it is stopped and a new one is started. By setting the property to true, behaviour is changed an previously running clean command is finished and the new one is skipped - since 5.5.1

Generated at Mon Feb 12 11:00:38 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.