[MAGNOLIA-1393] Pdf parsing error when bootstrapping samples Created: 20/Feb/07  Updated: 23/Jan/13  Resolved: 11/Feb/08

Status: Closed
Project: Magnolia
Component/s: build, samples
Affects Version/s: 3.1 M1
Fix Version/s: 3.1 M1

Type: Task Priority: Major
Reporter: Capitaine Harold Assignee: Fabrizio Giustina
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows XP, tomcat 5.0, java sdk 1.4.2.12, eclipse wtp 3.2


Template:
Acceptance criteria:
Empty
Task DoR:
Empty
Date of First Response:

 Description   

I've update from the trunk, I did "mvn clean eclipse:clean eclipse -Dwtpversion=1.0".
Then I started magnolia from eclipse and I get the stack below:
WARN org.apache.jackrabbit.core.query.LazyReader LazyReader.java(read:82) 20.02.2007 15:42:50 exception initializing reader org.apache.jackrabbit.core.query.PdfTextFilter$1: java.io.IOException: Error: Expected hex number, actual=' 2'
java.lang.Throwable: Warning: You did not close the PDF Document
at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)

I don't understand what does it mean as I don't have any kind of opened document



 Comments   
Comment by Magnolia International [ 20/Feb/07 ]

trunk != 3.0.2

Comment by Magnolia International [ 20/Feb/07 ]
  • please avoid double posting
  • the stacktrace there should not prevent from bootstrapping
  • could be that the trunk is currently unstable
Comment by Anthony Ogier [ 20/Feb/07 ]

Actually, that bug really exists, I've got the same with a fresh checkout and same mvn command.
It's true that it doesn't prevent from bootstrapping, but I don't think it's a "normal behaviour" isn't it ?

Comment by Magnolia International [ 20/Feb/07 ]

MAGNOLIA-1158 ?

Comment by Anthony Ogier [ 20/Feb/07 ]

Yep, and it still stacks ... so it doesn't seem to be resolved.

Comment by Magnolia International [ 20/Feb/07 ]

the thing is the trunk is running with the latest jackrabbit and lucene... not saying that's the cause, but all i'm saying is i suspect some dependencies incompatibility somewhere. i don't think anybody here built the complete trunk since we branched, so if there's indeed a problem there, avoiding double-posts and proper error reporting would be helpful
the 3.0 branch can bootstrap, at least as far as my current project goes.

i'll reopen - but then someone needs to investigate and give this ticket a proper and meaningful title

Comment by Fabrizio Giustina [ 20/Feb/07 ]

I can confirm trunk is working properly but that I get that warning from pdfbox too... not sure if it can dependend on an invalid pdf somewhere into the default bootstrapped content.

Comment by Nicolas Modrzyk [ 21/Feb/07 ]

http://www.pdfbox.org/userguide/faq.html#pdfbox_close_warning

Is this to be reported to the jackrabbit folks ?
But it has to be more than that. The exception appears only for some pdf and not others.

Indeed, as Fabrizio said it's not anywhere harmful.

Comment by Fabrizio Giustina [ 23/Feb/07 ]

patch posted to http://issues.apache.org/jira/browse/JCR-764

Comment by Fabrizio Giustina [ 23/Feb/07 ]

close since assigned to jackrabbit. At this moment this could only cause an annoying stacktrace printed out during bootstrapping (or during the upload of a pdf document)

Comment by Magnolia International [ 23/Feb/07 ]

"resolution" should have been set "won't fix" then, maybe ?

Comment by Philipp Bracher [ 06/Mar/07 ]

This stacktrace is occurring since 3.0 (one of the pdf files causes the pdf box to fail during indexing). The file lies in the samples module. This will disapear with newer jackrabbit/lucens/pdf box version.

There is no harm and bootstrapping works fine (except indexing of this file).

Comment by Magnolia International [ 16/Apr/07 ]

Reopening: with jackrabbit-1.2.3 and pdfbox-0.7.1, we still get:

WARN org.apache.jackrabbit.core.query.LazyReader LazyReader.java(read:82) 16.04.2007 10:41:16 exception initializing reader org.apache.jackrabbit.core.query.PdfTextFilter$1: java.io.IOException: Error: Expected hex number, actual=' 2'

upon bootstrapping.

Comment by Magnolia International [ 16/Apr/07 ]

nevermind - it's actually fine with jackrabbit 1.2.3 and pdfbox 0.7.1

Generated at Mon Feb 12 03:26:29 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.