[DOCU-222] Search Created: 01/Nov/11  Updated: 31/Jan/13  Resolved: 31/Jan/13

Status: Closed
Project: Documentation
Component/s: content
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Neutral
Reporter: Antti Hietala Assignee: Antti Hietala
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Template:
Acceptance criteria:
Empty
Task DoR:
Empty
Date of First Response:

 Description   

Document how search works in Magnolia CMS. New top-level topic.

Scope

Indexing

  • What is indexing?
  • Jackrabbit default search index is based on a Lucene implementation. Summarize.
  • How is search indexing configured? Is Magnolia's implementation any different from the default?
  • Does Magnolia provide any proprietary search features?
  • Configuring indexing in repository configuration XML. Describe the sample .xml files provided for each persistence manager.
  • What workspaces are indexed?
  • What content is indexed? File types, names, content, metadata
  • Excluding content from indexing SUPPORT-171,
  • Where are the indexes on the file system? Show locations when using Derby and MySQL. SUPPORT-30
  • When is indexing performed? SUPPORT-682, SUPPORT-30

Querying

  • How your data is structured in the repository
  • Querying the data
  • Tools > JCR queries
  • SQL and XPath
  • Executing queries from code and using the results.

Security

  • How does access control work with search? If I don't have access to an item, can I still find it?

Language

  • How do you index multilanguage content?
  • What languages can the default indexer analyze?
  • What to do if your language is not indexable by default?

Resolving issues

  • Disabling search indexing
  • Deleting corrupted indexes. SUPPORT-459

Federated / aggregated search

  • How would you aggregate search results from multiple content repositories?
  • From external repositories?

External indexing



 Comments   
Comment by Magnolia International [ 01/Nov/11 ]
  • The Solr module is not an alternative to using Lucene in Jackrabbit.
  • The Solr module is essentially doing "external indexing" as well.
  • It is much more important that people understand how to query the repository, and what to query (the structure of the(ir) data). Configuring/extending Lucene's indexing via Jackrabbit is much further than that. Don't point people there before they've reached the limits of what they can do with queries.
Comment by Suzanne Deprez [ 21/Nov/11 ]

Started an outline at docuauthor Search page.

Comment by Suzanne Deprez [ 07/Dec/11 ]

The scope of the documentation requested by this issue has changed from the description. The primary scope of the documentation is now to describe expansions that can be made to the search functionality provided with Magnolia CMS with and without STK especially searching other workspaces.

Comment by Christian Hauser [ 05/Jan/12 ]

Additionally - It would be great to have an use case on how to build an advanced search using specific page or article type attributes for the web site users.

This search should also respects ACL therefor working with login.

Let's say before Summer 2012

Comment by Suzanne Deprez [ 11/Jan/12 ]

Christian, please open a new ticket to cover these. Please clarify the article type attributes for the new ticket. Are you referring to the template selection for a page?

Comment by Christian Hauser [ 11/Apr/12 ]

The documentation of a search integration of web projects should cover:

  • Indexing types
    • incremental (recommended), add, change or remove from index
    • batch
    • crawl as an legacy side kick
  • Indexing event
    • Directly by the activation workflow
    • As hourly batch of the last activated contend node controlled by the scheduler
  • Search scope, maybe create 3 indexed residing side by side
    • Published only
    • Draft and Published
    • Recent versions
  • Handle data
    • How to handle attributes of elements, add them to main index and/or attribute index for relevance ranking
    • How to handle component (UID) indexing, e.g. search returns:
    • URI default parent pages,
    • URI#anchor of default parent pages jumping to paragraph component,
  • Give some insights on exiting integrations
    • Lucene
    • SolR
    • Autonomy (are there?)
    • others....

See also:
http://dev.magnolia-cms.com/~gjoseph/opening-the-door-to-semantic-search-in-magnolia

Generated at Mon Feb 12 01:07:05 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.