[MAGNOLIA-5311] XML import should prevent duplicate node names Created: 13/Sep/13  Updated: 01/Nov/13  Resolved: 31/Oct/13

Status: Closed
Project: Magnolia
Component/s: core
Affects Version/s: 4.5.11, 5.0.4
Fix Version/s: 4.5.13

Type: Bug Priority: Major
Reporter: Joerg von Frantzius Assignee: Peili Liang
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
causality
is causing MAGNOLIA-5433 As a developer it's clear that I cann... Closed
relation
is related to MAGNOLIA-613 Import of XML Data makes problems Closed
is related to MAGNOLIA-5444 CLONE: XML import should prevent dupl... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:
Epic Link: Support
Sprint: 4.5.13

 Description   

Currently the XML import will allow the XML to contain duplicate node names and import them. Since Magnolia doesn't support duplicate node names in a lot of places (e.g. versioning and activation), this can lead to serious problems. This can happen with XML generated outside of Magnola or hand-written/modified XML.

The XML import logic should either reject the import or rename nodes appropriately while importing.



 Comments   
Comment by Christoph Meier [ 21/Oct/13 ]

To resolve this issue, something in DataTransporter#importXmlStream must be changed.
The "problem" is, that this class is heavily used by the system e.g. during installation.
When i implement a "solution" which is ensuring that an imported node with a name which already was present on some of the existing nodes becomes renamed (e.g. /admin/peter becomes /admin/peter0)..
.. then some error occurs during installation; installation will not finish. The reason for that might be, that "superuser" and "anonymous" gets renamed (so it looks like they are imported 2 times ... which is strange ...)
...
So. i commented out my "fix", then installed, then again changed the code that it contains the fix. Then "it works" (in a way, that i can import "peter" as xml and after import we have peter and peter0). But for sure this cannot be the solution.

Comment by Christoph Meier [ 22/Oct/13 ]

I tried another solution, too.
Putting the node to import first to /$targetParent/$someTmpName/newNode
then we check whether /$targetParent already has a child-node with newNode-name,
f yes: rename /$targetParent/$someTmpName/newNode TO /$targetParent/$someTmpName/uniqueName (e.g. /$targetParent/$someTmpName/newNode0)
then move /$targetParent/$someTmpName/newNode0 TO /$targetParent/newNode0
=> other ways to do that (e.g. loading a Document) or keeping the list of the Nodes before importing (the "1st version" described in this ticket above) seem to be much more memory-consuming

Unfortunately this approach also fails; again during installation occurs an exception:
Error while installing or updating core module. Task 'Bootstrap' failed. (ItemExistsException: Same name siblings are not allowed: node /server/MetaData)
java.lang.RuntimeException: Error importing config.server.filters: Same name siblings are not allowed: node /server/MetaData

Comment by Christoph Meier [ 22/Oct/13 ]

I'm not sure whether this ticket really can be resolved. .
My 2 attempts are commited against branch magnolia-4.5.x-MAGNOLIA-5311. 2nd attempts is "active", 1st attempt is commented but should be understandable.
I have to give away this one; maybe someone else which is more familiar with JCR can try to resolve it. (I'll unassign)

Comment by Peili Liang [ 25/Oct/13 ]

Use alertUtil to report error to user which has no internationalization.

Comment by Christoph Meier [ 28/Oct/13 ]

Review: all fine. (earlier complaint due to nev.ironment-problems of the reviewer)

Comment by Jan Haderka [ 29/Oct/13 ]

Looks great. Just two tiny remarks:

  • NodeUtil.getSameNameSiblingNode() calls itself recursively so it might fail with stack overflow when importing very deep tree. You can avoid deep recursion for example by using something like:
    List<NodeIterator> iters = new ArrayList<NodeIterator>();
    iters.add(node.getNodes());
    while (!iters.isEmpty()) {
      List<NodeIterator> tmp = updateChildren(iters);
      iters.clear();
      iters.addAll(tmp);
    }
    private List<NodeIterator> updateChildren(List<NodeIterator> iters) {
      List<NodeIterator> tmp = new ArrayList<NodeIterator>();
      for (NodeIterator iter : iters) {
        while (iter.hasNext()) {
          Node node = iter.nextNode();
          // do some business logic
          tmp.add(node.getNodes());
        }
      }
      return tmp;
    }
    
  • testImportXmlWithSameNameSiblings() is actually multiple tests. Please split them. Having all together makes it harder to find out what exactly is failing when test starts to fail since the first failure is preventing other checks in that test to be executed.
Comment by Cheng Hu [ 30/Oct/13 ]

There are currently two questions with this issue:

  • The current solution submitted for review will prevent an import if the import will result in a tree where any node has two children of the same name. There is uncertainty about whether this is too strict a limitation or whether some workspaces should be allowed to have trees where a node can have multiple children of the same name.
  • There is a question of what is the best (most memory efficient and time efficient) way to check whether no node has more than one sibling of the same name.
    • The current submitted recursive solution is O(N) in space complexity where N is the maximum depth of the tree, and O(M) in time complexity where M is the number of nodes
    • The iterative solution proposed above is O(2^N) in space complexity where N is the maximum depth of the tree, and O(M) in time complexity where M is the number of nodes
    • There exists a solution that is O(1) in space complexity and O(M) in time complexity where M is the number of nodes, however we must adapt this algorithm from a binary tree to an n-ary tree, and prove that it is correct because it is non-trivial
Comment by Cheng Hu [ 31/Oct/13 ]

Spoke to Jan and agreed to use iteration since that is the solution that experience has shown to be problem-free.

Solution has been changed to iteration and pushed.

Comment by Jan Haderka [ 31/Oct/13 ]

After discussion in the architecture group here is what we agreed upon: Since it is unlikely that Magnolia will support same name siblings in near future, we

  • change the default node type definition to not allow same name siblings in a first place and
  • distribute script to update existing installations with new node type definition
  • add install condition to check that same name siblings are not allowed
Comment by Christopher Zimmermann [ 31/Oct/13 ]

Created story: http://jira.magnolia-cms.com/browse/MAGNOLIA-5433 to implement above decision.

Comment by Christopher Zimmermann [ 01/Nov/13 ]

For 4.5 we use this fix.
For 5 series we will implement the new nodetype, as specified in the linked story.

Generated at Mon Feb 12 04:03:57 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.