[MGNLEESOLR-134] A crawler capable of following PDF links Created: 18/Jun/19  Updated: 19/Jun/19

Status: Open
Project: Solr Search Provider
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Neutral
Reporter: Richard Gange Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)

 Description   

We need a crawler implementation that is capable of following the internal links like PDFs.



 Comments   
Comment by Richard Gange [ 19/Jun/19 ]

Something like this: https://stackoverflow.com/questions/51044793/how-to-implement-a-java-crawler-to-crawl-for-pdf-file-links

Maybe we could have a config option for PDFs (perhaps images as well)

Comment by Richard Gange [ 19/Jun/19 ]

One other idea, could we make the FILTERS configurable? Maybe that would be enough.

Generated at Mon Feb 12 11:00:27 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.