[MAGNOLIA-5014] Search _still_ returns nothing when searching for Umlaut! Created: 04/May/13  Updated: 28/Aug/13  Resolved: 26/Aug/13

Status: Closed
Project: Magnolia
Component/s: fckeditor
Affects Version/s: 4.4.9, 4.5.8
Fix Version/s: 4.5.11

Type: Improvement Priority: Major
Reporter: Will Scheidegger Assignee: Jaroslav Simak
Resolution: Fixed Votes: 1
Labels: fckeditor, search, umlaut
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Cloners
is cloned by MGNLUI-1957 Search returns nothing when searching... Open
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:

 Description   

This has been discussed many times, and I really did not expect to see that bug anymore, but...

If you enter text with umlaut characters, FCKEditor will automatically escape these. Now when you try to search for a word with an umlaut you're out of luck.

Possible work-arounds:

  1. Turn of escaping of latin special characters in FCKEditor (FCKConfig.IncludeLatinEntities = false
  2. encode your query term
  3. or, as Boris once proposed: save the content once in the escaped version and once without escaping


 Comments   
Comment by Jan Haderka [ 14/Aug/13 ]

Actually I don't see the bug here, more of an improvement - what we could/should definitively do is to set FCKConfig.IncludeLatinEntities = false when magnolia.utf8.enabled=true, but that is pretty much it. You have to choose to either store encoded content or not to encode.

The second option would not really work as it would not gain hit on plain text fields and the third one is really just a workaround. Another workaround would be to use your own analyser that would convert all html entities to their utf-8 representation, but that still not full solution as it doesn't solve abbreviation of the text snippets or highlighting of search results.

Comment by Jan Haderka [ 14/Aug/13 ]

Flag name has changed for CKEditor - see http://docs.cksource.com/ckeditor_api/symbols/CKEDITOR.config.html#.entities_latin

Comment by Tom Wespi [ 26/Aug/13 ]

I think it is not a good idea to make this problem depend on magnolia.utf8.enabled

Due normally we don't want UTF-8 URL's, but want the FCKConfig.IncludeLatinEntities = false to be set.

As german, french, spanish ... has a lot of special characters, but still we don't want the users to create URL's like 'héllowörld'

Comment by Will Scheidegger [ 26/Aug/13 ]

I don't think this is related. We allways have UTF-8 turned on but still you cannot create page node names with special characters and therefore no "URL's like 'héllowörld'"

Comment by Jaroslav Simak [ 26/Aug/13 ]

Hi Tom,

it is possible to overwrite value of FCKConfig.IncludeLatinEntities by adding property includeLatinEntities under config node of fckEditor module.

Cheers.

Comment by Jan Haderka [ 26/Aug/13 ]

@Tom as Jaroslav mentioned above you can still set your configuration to whatever you want. The only thing we are changing is aligning default with utf-8 settings in case you choose not to override fck config.

@Will afaik you need to also configure your tomcat to allow UTF-8 in urls to be able to have such urls. The UTF-8 property setting in Magnolia is just declaration of your instance configuration readiness to handle everything UTF-8 has to offer which includes both URL and content encoding.

Comment by Will Scheidegger [ 26/Aug/13 ]

@Jan: Thanks. My comment was not a question but rather an answer to Tom's because I did not see how turning on the utf-8 settings was related to enabling the user to create URLs with special characters. But I guess I'm missing something

Comment by Jan Haderka [ 26/Aug/13 ]

@Will if you turn on UTF-8 setting in Magnolia, Magnolia will let you create pages with utf-8 chars in names.

Comment by Tom Wespi [ 26/Aug/13 ]

I would propose that is by default:

FCKConfig.IncludeLatinEntities = false

I don't see point 3 from Boris why those entries should be saved both ways.

It makes the full fulltext search easier for rookies.

Generated at Mon Feb 12 04:01:10 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.