[MAGNOLIA-255] Official support for UTF-8 in Magnolia Created: 20/Dec/04  Updated: 14/Mar/05  Resolved: 12/Mar/05

Status: Closed
Project: Magnolia
Component/s: None
Affects Version/s: 2.01
Fix Version/s: 2.1 Final

Type: Improvement Priority: Major
Reporter: Marc Oesch Assignee: Fabrizio Giustina
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:

 Description   

There is a patch for UTF-8 and it would be good to incorporate it (IMHO officially into a future release:

The following is copied from the magnolia-user mailing list from Dec 20, 2004:

______________
Hi Dmitriy

All is fine except :

>After I fixed Tomcat and MultipartParser Magnolia admin pages started showing ??? in place of russian letters. I added
>contentType="text/html;charset=UTF-8"

I would rather use MIMEMappings under config/server , there you can set
extension - html
mime-type - text/html; charset=UTF-8
this way you don't have to change any admin jsp's, content type will be set by the ContentType filter

Regards,

  • Sameer

__________original__________

On Dec 20, 2004, at 4:43 PM, Дегтярев, Дмитрий В. wrote:

Hello everybody,

I just wanted to share some findings in the area of internationalization of Magnolia and Tomcat. Out-of-the box, Magnolia allows to enter international characters, but if you take a look at how this data gets stored you will see, that the upper byte gets removed, which means that real Unicode is not working. I ran into this problem when I tried to put some code into jsp template which pulls Russian strings from a database. All information from the database appeared corrupted (it did show two characters per letter on the screen). The problem was solved after I added

contentType="text/html;charset=UTF-8"

to all my JSP templates, but after that all russian letters which come from Magnolia appeared as ?, because it is stored incorrectly. I found out that something goes wrong when saving the data. When the browser sends request to the server, it should indicate the encoding it uses. I suspect the browser that I use (Firefox 1.0) does not do this, or may be XMLHttp object doesn't do this, anyway, I don't know. What I know, is that Tomcat doesn't know that the request is UTF-8, and defaults to ISO-8859-1. I found no other way to make Tomcat default to UTF-8 except rewrite its code. I also patched com.oreilly.servlet.multipart.MultipartParser which Magnolia uses to handle dialogs so it now defaults to UTF-8.

After I fixed Tomcat and MultipartParser Magnolia admin pages started showing ??? in place of russian letters. I added

contentType="text/html;charset=UTF-8"

To all admin templates and now everything works.

If somebody is interested in getting the patches please let me know.

Regards,
Dmitriy Degtiarev
tgcp@mail.ru



 Comments   
Comment by Fabrizio Giustina [ 23/Feb/05 ]

some info about tomcat and UTF-8:
http://marc.theaimsgroup.com/?t=105524444400002&r=1&w=2

I started converting jsp pages to UTF-8 for magnolia 2.1. Hope to have an out-of-the-box utf8 support before the release, at the moment I am still seeing problems in text added from magnolia dialogs.

Comment by Fabrizio Giustina [ 12/Mar/05 ]

partially working in svn trunk:
the context added using dialogs now is parsed used the proper encoding, which defaults to UTF8 if not set.

Configuration values (e.g. page titles) added using the admn interface are still wrong: Magnolia actually uses get for posting values but encoding is not considered for parameters in get.

Comment by Fabrizio Giustina [ 12/Mar/05 ]

Work completed on svn trunk.
magnolia 2.1 now uses UTF8 out of the box!

In order to support UTF8 XmlHttpRequest (admin interface configuration) now uses POST with parameters in the body of the request (since query string encoding is ISO-88591 and can't be changed). Works fine in IE and firefox, maybe it needs a check with Safari?

Comment by Marc Oesch [ 14/Mar/05 ]

Fabrizio, this is very nice, thank you !

Generated at Mon Feb 12 03:15:37 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.