[BUILD-311] Remove use of Xerces and replace it with JAXP Created: 31/May/18  Updated: 25/Jun/18  Resolved: 20/Jun/18

Status: Closed
Project: Build
Component/s: None
Affects Version/s: None
Fix Version/s: BOM 5.7

Type: Task Priority: Neutral
Reporter: Antti Hietala Assignee: Michael Mühlebach
Resolution: Done Votes: 0
Labels: security
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
causality
duplicate
is duplicated by MAGNOLIA-5983 Replace outdated xerces 2.8.1 Closed
Template:
Acceptance criteria:
Empty
Task DoR:
Empty
Release notes required:
Yes
Date of First Response:
Epic Link: 5.7 library update
Sprint: Basel 149
Story Points: 2

 Description   

Since Java 1.7 both the XML APIs and the default implementation for it are part of the JDK. Since then neither the xml-apis nor the Xerces artifacts should be used anymore.

The goal is to get rid of all Xerces dependencies. In most places we can just remove the dependency because both Xerces and Jaxp use and provide the same API. Some places (like in main) are a little bit more tricky because the Xerces classes are used explicitly which require some small code changes (see MAGNOLIA-5983).
Some 3rd party tools we use like NekoHTML in diff, use internal classes of Xerces. This means we have to remove the use of those libraries as well. For this we had to create our own fork of daisyDiff.



 Comments   
Comment by Michael Mühlebach [ 15/Jun/18 ]

Some findings:

Newest version of Xerces uses some XML-APIs which are available with Java9. The used classes are extracted into the xml-apis artifact which we banned ages ago because it lead to conflicts because most of the classes are part of the jdk (JAXP).

To make matters worse the Xerces project doesn't maintain their own maven artifacts which means its unclear till today who or if the artifacts will show up in the maven central repo.

The current decision therefore is to drop Xerces in favor of JAXP which leads to very small adjustments in some modules ... except diff.

Comment by Michael Mühlebach [ 15/Jun/18 ]

Special case diff:

The diff module uses daisydiff which is unfortunately discontinued.
Internally DaisyDiff uses NekoHTML which uses Xerces. (NekoHTML is abandoned even longer than DaisyDiff)

NekoHTML is a HTML parser which uses Xerces directly because the Xerces' abstract sax parsers allow to hook deeply into the document traversal and enables NekoHTML to use in this workaround way to use an XML parser to parse HTML which is usually not possible (HTML is not XML compliant ... you think of XHTML )

Therefore the ideal solution would be to not replace Xerces with JAXP, which doesn't support parsing of HTML, but to replace NekoHTML together with Xerces with an actual HTML parser like Jsoup.

Generated at Sun Feb 11 23:40:45 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.