[MGNLEESOLR-73] Allow magnolia-content-indexer to crawl in password restricted content Created: 23/Jul/15  Updated: 06/Jul/16  Resolved: 29/Apr/16

Status: Closed
Project: Solr Search Provider
Component/s: None
Affects Version/s: 2.2.1
Fix Version/s: 5.0.2

Type: Improvement Priority: Neutral
Reporter: Richard Gange Assignee: Milan Divilek
Resolution: Fixed Votes: 1
Labels: support
Remaining Estimate: 0d
Time Spent: 0.5h
Original Estimate: Not Specified

Attachments: Java Source File CrawlerConfig.java     Java Source File CrawlerIndexerCommand.java     Java Source File MgnlCrawler.java    
Issue Links:
causality
relation
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Sprint: Kromeriz 41
Story Points: 3

 Description   

In our customer project we have requirement to be able to search in restricted content and show results based on user role (multiple roles possible). Current implementation of Crawler4J allow developer to specify login credentials and login url to get "logged-in" cookie and crawl in restricted content. Unfortunatelly content-indexer module doesn't support that.

To solve that problem we modified magnolia-content-indexer code:

CrawlerConfig.java - added credentials and other fields to read from magnolia crawler configuration
CrawlerIndexerCommand.java - credentials configuration added to crawler4J config (this will force crawler to log in)
MgnlCrawler.java - added condition in shouldVisit() method to dont let crawler vitis LOGOUT link. Additionally to be able to search only for role-specific content we have changed ID field generation (ID saved in SOLR) to include crawler type:
id = crawlertype_url

This is not the best solution, but we are not able to use multiple Solr collections out-of-the-box, so additional parameter in ID allow us to have duplicated records in Solr index. In search fields we have additional query parameter to check type based on logged-in user.

We have also introduced some new config values in magnolia config (/modules/content-indexer/config/crawlers/ANY_CRAWLER) :

authentication - boolean value if authentication is required for that crawler
logoutIdentifier - the shortest string that will identify logout link (in our case mgnlLogout)
credentials content node

  • loginFieldName
  • passwordFieldName (both are names of fields in login form)
  • username
  • password
  • loginUrl (link to login page)
  • urlPort - additional field to specify port if not default (80) is used

I have attached configuration screenshot and modified classes. Please check if you can introduce something like this in new version of solr-search-module.


Generated at Mon Feb 12 10:59:50 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.