[MGNLEESOLR-187] user-agent for clean command Created: 19/Apr/23  Updated: 23/Oct/23  Resolved: 03/May/23

Status: Closed
Project: Solr Search Provider
Component/s: None
Affects Version/s: 6.1.2
Fix Version/s: 6.1.4

Type: Story Priority: Neutral
Reporter: Tomáš Gregovský Assignee: Milan Divilek
Resolution: Fixed Votes: 0
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: PNG File screenshot-1.png    
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MGNLEESOLR-188 Implementation Sub-task Completed Milan Divilek  
MGNLEESOLR-189 Review Sub-task Completed Javier Benito  
MGNLEESOLR-190 preintQA Sub-task Completed Javier Benito  
MGNLEESOLR-191 QA Sub-task Completed Jaroslav Simak  
Template:
Acceptance criteria:
Empty
Task DoD:
[X]* Doc/release notes changes? Comment present?
[X]* Downstream builds green?
[X]* Solution information and context easily available?
[X]* Tests
[X]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Documentation update required:
Yes
Date of First Response:
Epic Link: DevX Bucket
Sprint: DevX 36
Story Points: 2
Team: DeveloperX
Work Started:
Approved:
Yes

 Description   

Background:

We use solr crawler and clean command for project with multiregional page. Multiregional page had custom backend geolocation redirect functionality - if user from europe visit us website, he is redirected to europe website. Because we still need to crawler the website or other robots to be able to visit page they are visiting without being redirected (e.g. redirection only for real users), we have ignore list based on user-agents. one of the regexp in list is 'crawler' which ignores solr crawler. so crawling of the website is working fine and our crawler is able to crawl any regional page. Unfortunately this is not case for solr clean command which is getting 302 redirect on every regional page which is different form magnolia author instance region. After deeper investigation we find out solr clean command doesn't have any user-agent.

Solution:

Please add same user-agent as has crawler also for clean command.



 Comments   
Comment by Minh Nguyen [ 20/Apr/23 ]

In CleanSolrIndexCommand

we have 2 ways of checking status code
1. statusCode = this.getHttpStatusCodeFromHead(url);
2. Connection.Response response = Jsoup.connect(url).followRedirects(this.isFollowRedirects()).execute();

I did try #2 to find user-agent header but it didn't include.

Thank you.

Generated at Mon Feb 12 11:00:58 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.