[EXTIDX-20] Search singular and plural words in French Created: 23/Aug/16 Updated: 16/Mar/23 Resolved: 16/Mar/23 |
|
| Status: | Closed |
| Project: | External Indexing (closed) |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | Anh Tuan TRUONG | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Template: |
|
||||
| Acceptance criteria: |
Empty
|
||||
| Date of First Response: | |||||
| Description |
|
When I search some word at my search page, using fullTextSearch like query below: SELECT * FROM [mgnl:content] AS t WHERE ISDESCENDANTNODE(['/']) AND (CONTAINS(name, 'chaussure') ==> It only return results having "chaussure", other contents having "chaussures" not returned. (It consider singular and plural are 2 keywords) How can I change configuration in somewhere so that Magnolia can consider 2 words above are the same keyword (using French dictionary)? I see Magnolia wiki, I saw document https://wiki.magnolia-cms.com/display/WIKI/Search+Index+Configuration+File#SearchIndexConfigurationFile-Lucene. and https://wiki.magnolia-cms.com/display/DEV/Indexing+configuration+-+optimizations+for+Magnolia+5 Maybe I will change some configuration in file index_configuration.xml so that lucene will tokenize and index using French analyzer? And I've only apply this for specific workspaces (not all). Thanks so much, |
| Comments |
| Comment by Jan Haderka [ 23/Aug/16 ] |
|
Yes, you need to change index configuration for affected workspace or for whole repository to use french analyzer and then delete index to force it's regeneration. |
| Comment by Anh Tuan TRUONG [ 23/Aug/16 ] |
|
I've added <param name="analyzer" value="org.apache.lucene.analysis.fr.FrenchAnalyzer"/> in search index configuration file and deleting all folder index under workspace/ {workspace_name}-> I have got error like when starting Magnolia 2016-08-23 14:41:05,772 ERROR info.magnolia.init.MagnoliaServletContextListener : Oops, Magnolia could not be started java.lang.VerifyError: class org.apache.jackrabbit.core.query.lucene.JackrabbitAnalyzer overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; . GRAVE: Exception lors de l'envoi de l'évènement contexte initialisé (context initialized) à l'instance de classe d'écoute (listener) info.magnolia.init.MagnoliaServletContextListener java.lang.VerifyError: class org.apache.jackrabbit.core.query.lucene.JackrabbitAnalyzer overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; Moreover, if I've only changed in affected workspaces, where I can change? Thanks so much, |
| Comment by Jan Haderka [ 23/Aug/16 ] |
|
Looks like the version of analyzer doesn't match the version of JR or Lucene that it needs to go with, you need to check the libraries you are pulling in.
.../webapps/magnoliaAuthor/repositories/magnolia/workspaces/<your workspace>/workspace.xml |
| Comment by Richard Gange [ 24/Aug/16 ] |
|
Is it such a good idea to use the French analyzer for everything? <param name="analyzer" value="org.apache.lucene.analysis.fr.FrenchAnalyzer"/> It might be better to target those properties which hold French language data. Like the German language example here https://wiki.apache.org/jackrabbit/IndexingConfiguration.
<analyzer class="org.apache.lucene.analysis.fr.FrenchAnalyzer">
<property>name</property>
</analyzer>
You can also use regex to target properties. |
| Comment by Jan Haderka [ 24/Aug/16 ] |
|
No, IMO it is not good idea to use it for everything. That's why I offered as first choice to set it just for affected workspace. Alternatively/ideally, you would go and use external search engine such as Solr to power all user related search to avoid any interference with internal workings of Magnolia. |
| Comment by Richard Gange [ 24/Aug/16 ] |
|
Right, just the workspace, but I am saying we go even finer than that and only target those properties, within the workspace, containing french. Configuring the French analyzer at the SearchIndex level like this
<SearchIndex>
...
<param name="analyzer" value="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
</SearchIndex>
will analyze the entire workspace, all properties, with the FrenchAnalyzer. I'm just not sure that you can do that and from the error it appears JR could be balking at it. Instead add an entry to your custom indexing_configuration file which target the properties which have french. So determine the names of all properties which store french. Then target them with the french analyzer like this:
<analyzer class="org.apache.lucene.analysis.fr.FrenchAnalyzer">
<property>name</property>
</analyzer>
|
| Comment by Jan Haderka [ 23/Sep/16 ] |
|
rgange yes, with a small distinction "we" can't do anything. Only user/client/dialog-author knowns in which of his/her properties might contain french words so this is a kind of configuration that user needs to do on final installation, but not the one that can be provided out of the box by Magnolia. |
| Comment by Adam Jones [ 16/Mar/23 ] |
|
Closing due to project being archived. |