Uploaded image for project: 'Magnolia'
  1. Magnolia
  2. MAGNOLIA-7964

Problems with publishing Unicode characters on JBoss

    XMLWordPrintable

Details

    • Bug
    • Resolution: Workaround exists
    • Major
    • None
    • 5.7.6
    • None
    • None
    • JBoss 7.2.4. MySQL 5.5.68-MariaDB

    Description

      Steps to reproduce

      1.  Create page with a non-ASCII Unicode character like � in title. 
        🏠 
      2.  Publish page.
      3.  In server.log there is following exception:
      2020-11-16 10:55:41,310 INFO  [stdout] (default task-55) Caused by: org.xml.sax.SAXParseException: Character reference "&#55356" is an invalid XML character. 

      We have this problem also when we want to import this page.
       
      Development notes

      On tomcat and also with Magnolia 5.7.5 we haven't this problem. We found following link from redhat: https://issues.redhat.com/browse/JBEAP-14542?attachmentViewMode=list&_sscc=t 

      So we tried out to set DB to UTF-8 4 Byte. But this doesn't solve the problem.

      Workaround
      we have found that the issue comes from a bug in xalan library provided in JBoss XALANJ-2560. Then, we've been trying to exclude this library by excluding xalan module in the deployment descriptor as well as creating a new module with the correct xalan lib. None of them have worked for us though, perhaps from Jboss side you could get a working approach.

      As a workaround to get it working till a working way to exclude xalan is found, you could:

      • Replace the existing jars
      • Modify module.xml to point at the new jars:
        <JBOSS_HOME>/modules/org/apache/xalan/main/module.xml
        <?xml version="1.0" encoding="UTF-8"?>
        <module xmlns="urn:jboss:module:1.5" name="org.apache.xalan">
            <resources>
                <resource-root path="serializer.jar"/>  <-------
                <resource-root path="xalan.jar"/> <--------
            </resources>
            <dependencies>
                <module name="javax.api"/>
            </dependencies>
        </module>
        

      This way, we managed to get rid of the issue with the unicode characters.

      Checklists

        Acceptance criteria

        Attachments

          Activity

            People

              Unassigned Unassigned
              diana.racho Diana Racho
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Checklists

                  Bug DoR
                  Task DoD