Uploaded image for project: 'Solr Search Provider'
  1. Solr Search Provider
  2. MGNLEESOLR-73

Allow magnolia-content-indexer to crawl in password restricted content

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Neutral Neutral
    • 5.0.2
    • 2.2.1
    • Kromeriz 41
    • 3

      In our customer project we have requirement to be able to search in restricted content and show results based on user role (multiple roles possible). Current implementation of Crawler4J allow developer to specify login credentials and login url to get "logged-in" cookie and crawl in restricted content. Unfortunatelly content-indexer module doesn't support that.

      To solve that problem we modified magnolia-content-indexer code:

      CrawlerConfig.java - added credentials and other fields to read from magnolia crawler configuration
      CrawlerIndexerCommand.java - credentials configuration added to crawler4J config (this will force crawler to log in)
      MgnlCrawler.java - added condition in shouldVisit() method to dont let crawler vitis LOGOUT link. Additionally to be able to search only for role-specific content we have changed ID field generation (ID saved in SOLR) to include crawler type:
      id = crawlertype_url

      This is not the best solution, but we are not able to use multiple Solr collections out-of-the-box, so additional parameter in ID allow us to have duplicated records in Solr index. In search fields we have additional query parameter to check type based on logged-in user.

      We have also introduced some new config values in magnolia config (/modules/content-indexer/config/crawlers/ANY_CRAWLER) :

      authentication - boolean value if authentication is required for that crawler
      logoutIdentifier - the shortest string that will identify logout link (in our case mgnlLogout)
      credentials content node

      • loginFieldName
      • passwordFieldName (both are names of fields in login form)
      • username
      • password
      • loginUrl (link to login page)
      • urlPort - additional field to specify port if not default (80) is used

      I have attached configuration screenshot and modified classes. Please check if you can introduce something like this in new version of solr-search-module.

        Acceptance criteria

              mdivilek Milan Divilek
              rgange Richard Gange
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Task DoD

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0d
                    0d
                    Logged:
                    Time Spent - 0.5h
                    0.5h