[PAGES-486] UTF-8 characters in page name can cause loading to fail Created: 16/Aug/21  Updated: 26/Aug/22

Status: Open
Project: Magnolia pages module
Component/s: Pages app
Affects Version/s: 6.2.11
Fix Version/s: None

Type: Bug Priority: Neutral
Reporter: Michael Duerig Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[X]* Steps to reproduce, expected, and actual results filled
[X]* Affected version filled
Epic Link: AuthorX Maintenance
Team: AuthorX

 Description   

Steps to reproduce

  1. Enable UTF-8 encoding by setting magnolia.utf8.enabled in magnolia.properties
  2. Create a page with a name containing a UTF-8 encoded ä in NFD form (\x61\xCC\x88)
  3.  Preview the created page

Alternatively there is a PR with UI tests reproducing the issue.

Expected results

The page previews normally

Actual results

Page preview results in a 404

Workaround

Only use NFC form. Rename existing pages accordingly.

Development notes

The pages app can be fixed by normalising the node names in NodeNameHelper#getValidatedName(java.lang.String, java.lang.String).
This doesn't fix the problem when bootstrapping content though. In that case we deal with YAML (which doesn't specify the encoding) or XML (which does). YAML is handled by (DataTransporter#importYamlStream) us so we could do the normalising somehow. XML is fed into JR directly (DataTransporter#importXmlStream) making normalisation much harder.


Generated at Mon Feb 12 06:19:25 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.