[PUBLISHING-75] Add runtime dependency to SAX library to magnolia-core to prevent OOM error during export or publishing of larger binaries from DAM Created: 26/Jan/20  Updated: 25/Apr/22  Resolved: 25/Apr/22

Status: Closed
Project: Publishing
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Critical
Reporter: Christian Menzel Assignee: Rabie Hayoun
Resolution: Obsolete Votes: 8
Labels: VN-Analysis, maintenance
Remaining Estimate: Not Specified
Time Spent: 0.75d
Original Estimate: Not Specified
Environment:

Magnolia 6.1.4
Java 11 and Java 13


Attachments: PNG File image-2020-02-20-08-08-41-464.png     PNG File image_2022_03_09T07_14_35_174Z.png     PNG File image_2022_03_09T07_14_44_809Z.png    
Issue Links:
relation
is related to PUBLISHING-69 Publishing large objects consumes lot... Closed
is related to MAGNOLIA-7702 Export of large files fails on Java 1... Closed
Template:
Patch included:
Yes
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:
Team: Nucleus

 Description   

In a relaunch project for our client ZEG we encountered the problems described in MAGNOLIA-7702 and PUBLISHING-69.

We tracked it down to line 453 in org.apache.jackrabbit.commons.xml.Exporter:
handler.endElement(uri, local, getXMLName(uri, local));
when handling the attribute 'data' of type org.apache.jackrabbit.core.value.BinaryValueImpl causes an infinite loop.

'handler' is of type info.magnolia.importexport.filters.NamespaceFilter
and its member variable 'contentHandler' of type com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl
which is set in info.magnolia.importexport.command.JcrExportCommand.Format#getContentHandler

The class com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl is from the internal Java 11/13 java.xml module.

By adding a maven dependency to xalan to our webapp module, an instance of org.apache.xalan.transformer.TransformerIdentityImpl
is assigned in the getContentHandler method. With this change we were able to publish and export PDF files (up to 70MB) from DAM, which caused problems before.

Because xalan is quite old and compiled with Java 1.3, we also tried Saxon-HE which also works fine, but seems to use more CPU and memory resources.

<dependency>
<groupId>xalan</groupId>
<artifactId>xalan</artifactId>
<version>2.7.2</version>
<scope>runtime</scope>
</dependency>

or

<dependency>
<groupId>net.sf.saxon</groupId>
<artifactId>Saxon-HE</artifactId>
<version>9.9.1-6</version>
<scope>runtime</scope>
</dependency>

 



 Comments   
Comment by Pascal Zingg [ 20/Feb/20 ]

we run into the same problem. Exporting ~100MB of DAM content causes high CPU usage. Magnolia does not really recover anymore, which is very painful. Since no OOM is triggered, CPU is wasted unnecessarily.

We noticed that a worker got stuck at

com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endElement()

Comment by Christian Menzel [ 20/Feb/20 ]

Hi Pascal, could you please try to add the Saxon-HE dependency (see above) to your webapp pom.xml and see if it helps?

Comment by Pascal Zingg [ 20/Feb/20 ]

Hi Christian! Yes, both of the two runtime dependencies mentioned above the export works as desired. We are using Magnolia 6.1.3 with Java 11

Comment by Thomas Duffey [ 16/Mar/20 ]

We just bumped into this on a fresh 6.1.4 and adding the xalan dependency fixed it.

Comment by Rabie Hayoun [ 14/Mar/22 ]

Need check who should look into the security issue. dmaslanka 

Generated at Mon Feb 12 10:35:10 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.