[MAGNOLIA-8706] Unable to import exported big YAML file Created: 18/Jan/23  Updated: 09/May/23  Resolved: 27/Mar/23

Status: Closed
Project: Magnolia
Component/s: None
Affects Version/s: 6.2.27
Fix Version/s: 6.3.0, 6.2.31

Type: Bug Priority: Neutral
Reporter: Antony Hutchison Assignee: Roman Kovařík
Resolution: Fixed Votes: 1
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: 3h Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Issue Links:
Relates
relates to PUBLISHING-118 use YAML format for publishing pages Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MAGNOLIA-8823 PR: prevent big YAML exports Sub-task Completed Roman Kovařík  
MAGNOLIA-8824 Docu Sub-task Closed Roman Kovařík  
MAGNOLIA-8832 Rw Sub-task Completed Adam Siska  
MAGNOLIA-8833 Preint QA + PM Sub-task Completed Sang Ngo Huu  
MAGNOLIA-8847 QA Sub-task Completed Adam Siska  
MAGNOLIA-8848 Docu rv Documentation Task Closed Adrian Brooks  
Template:
Acceptance criteria:
Empty
Task DoD:
[X]* Doc/release notes changes? Comment present?
[X]* Downstream builds green?
[X]* Solution information and context easily available?
[X]* Tests
[X]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[X]* Steps to reproduce, expected, and actual results filled
[X]* Affected version filled
Release notes required:
Yes
Documentation update required:
Yes
Date of First Response:
Visible to:
Thomas Duffey
Epic Link: Nucleus Regular Maintenance
Sprint: Nucleus 32
Story Points: 5
Team: Nucleus
Work Started:

 Description   

Steps to reproduce

  1. click import in pages app
  2. upload yaml
  3. error occurs on save

Expected results

File should import correctly.

Actual results

Failure, combined with message in log:

org.yaml.snakeyaml.error.YAMLException: The incoming YAML document exceeds the limit: 3145728 code points.

Workaround

Unknown. Potentially force Magnolia to use SnakeYML 1.31 to import the data then re-export as XML.

Development notes

It seems a 3Mb code point limit was established in the SnakeYML 1.32 library used in Magnolia. Magnolia currently uses version 1.33.



 Comments   
Comment by Antony Hutchison [ 18/Jan/23 ]

Workaround: The same YAML file works if I downgrade SnakeYML to 1.31 in the Java WAR build via Maven exclusions and dependency management but at the cost of reintroducing vulnerabilities present in the old library version so obviously we wouldn't do this in production.

Comment by Pierre Sandrin [ 27/Jan/23 ]

Thanks Antony for reporting and finding out the cause and a workaround. Another (maybe obvious) Workaround is to use XML files instead. I created a Support Issue and hope this is going to be fixed soon.

Comment by Antony Hutchison [ 27/Jan/23 ]

Thanks @Pierre for your reply. 

The data import is generated outside of Magnolia and imported in as content (it's a Wordpress migration script) and hard to change to XML easily. What I've done is force the pom.xml to exclude snakeyml 1.33 and include 1.31 instead, which then allows me to import into a local copy of Magnolia, and re-export back out as XML.  I've not tried exporting a large volume of data from Magnolia as YAML, and I don't know for certain if it would cause the same fault but I'm happy to confirm that XML works as expected.

Rumour has it this is configurable in code (I saw a snippet on StackOverflow for Spring) but has no mechanism to control via Java properties or similar, presumably because it's designed to prevent DoS attacks [1]

I hope this helps find a fix.

 

[1] https://en.wikipedia.org/wiki/Billion_laughs_attack

Comment by Pierre Sandrin [ 30/Jan/23 ]

Hi Antony

Got response from Magnolia Support. They have a "known issue" now:

https://docs.magnolia-cms.com/product-docs/6.2/Developing/YAML.html#_yaml_file_size

Unfortunatly they dont's seen to have a solution for fixing it (without introducing vulnerabilities) at the moment... We will go back to XML files.

Comment by Roman Kovařík [ 24/Mar/23 ]

Hi ahutchison,

as pierre said, we were not allowed to increase the limit because of the security risk.

At least in the next Magnolia version, the export action will fallback to XML automatically for sizes above the limit.

YAML format is mainly meant for smaller exports which can be manually edited in text editors. With big file this advantage is anyway suppressed by memory limitation of the editors. The YAML import is also not very performant in this case as it needs to load the content in memory to parse the YAML tags like JCR property types (e.g. !weakreference) etc.

 

Best regards

Roman

Comment by Antony Hutchison [ 24/Mar/23 ]

Hi Pierre, Roman,

Thanks for the response. Falling back to XML to avoid useless exports is a good measure to help combat this. Sadly there are still vulnerabilities in Snakeyaml 1.33 [1] but Magnolia might not be affected if it uses SafeConstructor. I haven't looked at v2.0 yet so I'm not sure if that solves the problem or creates others. For now, I can use a modified Magnolia to import the YAML and export as XML until I can update the migration scripts that generate the YAML or create a YAML2XML tool. 

Thanks, 

Antony

 

[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-1471

Comment by Roman Kovařík [ 24/Mar/23 ]

Sadly there are still vulnerabilities in Snakeyaml 1.33 [1] but Magnolia might not be affected if it uses SafeConstructor.

Looks like this was dismissed in MAGNOLIA-8666 with with the same reasoning you've described. Anyway, thanks for the heads up.

 

Generated at Mon Feb 12 04:35:04 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.