[MAGNOLIA-1095] ASCII Backspace Character Breaks XML Export Created: 29/Sep/06  Updated: 23/Jan/13  Resolved: 04/Sep/08

Status: Closed
Project: Magnolia
Component/s: None
Affects Version/s: 3.0 RC2
Fix Version/s: 3.5

Type: Bug Priority: Critical
Reporter: Sean McMains Assignee: Jan Haderka
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Redhat Enterprise, BDB PM


Attachments: Text File exampleBadSource.txt    
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:

 Description   

One of our content editors pasted text into an FCK Editor field from another source. Among the content was a backspace character (ASCII 8, or CTRL-H).

While Magnolia could save and edit this page without difficulty, Jackrabbit was unable to render that into XML, which broke publishing, backups, export, etc. for that page. We were eventually able to fix the problem by using the JCR browser to copy the data into a text editor, have it zap the non-displayable characters, and then paste the result back into the repository.

To solve this issue, I'd suggest that stripping non-printable characters from String values (or at least for the FCK Editor controls) before persisting them to the repository might be a good approach.



 Comments   
Comment by Sean McMains [ 03/Oct/06 ]

We discovered that the ASCII 8 character was actually introduced by a bug in the FCK Editor implementation, where repeatedly editing a paragraph with a "\b" sequence will eventually cause it to be turned into the non-printable character. Details in MAGNOLIA-1065 <http://jira.magnolia.info/browse/MAGNOLIA-1065>.

Comment by Sean McMains [ 19/Apr/07 ]

We encountered another instance today where a user had somehow slipped a ASCII 26 character into one of their paragraphs. It broke XML exporting, rendering that page unpublishable and breaking our backups. I suggest this issue still needs attention.

Comment by Ryan Gardner [ 01/Mar/08 ]

This contains a small snippet of text. If you copy and paste this exact text into the "source" tab of an FCKEdit dialog, you will be able to reproduce this bug.

Comment by Ryan Gardner [ 01/Mar/08 ]

I can confirm that it also breaks XML output, backups, and activation of the page that contains the bad character. I can also confirm that it is a major hassle to find the exact paragraph that is breaking this because of where the exception is being caught it doesn't output any information about the context - it just says blames it on SAX - lists the character, and moves on.

Providing a more detailed exception when an export gets broken is very important. I've also noticed that the backup isn't very fault-tolerant. I will open another ticket about that issue.

Comment by Jan Haderka [ 06/Aug/08 ]

Not reproducible with 3.6.1 ... the content of NodeData value is actually base64 encoded for export when containing characters that could break xml.

Comment by Jan Haderka [ 04/Sep/08 ]

This issue have been solved in JackRabbit since version 1.3.1.
Magnolia 3.5 is delivered with JackRabbit 1.3.3, hence contains the fix for reported issue.
For details see related JR issue: https://issues.apache.org/jira/browse/JCR-674

Generated at Mon Feb 12 03:23:38 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.