Details
-
Task
-
Resolution: Obsolete
-
Neutral
-
None
-
None
-
None
-
-
Empty show more show less
-
Empty show more show less
Description
- Discover What is weighting from the Lucene index ?
- Provide a list, for internal documentation to further discuss
Notes:
Jackrabbit uses the default lucene algorithm to calculate the score for a jcr:contains clause. any other query element will usually return a score of 1000.
a quick test showed the following for the query:
//*[jcr:contains(.,'apache')] order by @jcr:score descending
jcr:score | text property
----------------------------------------------------------------------
1000 | "Apache Jackrabbit"
848 | "some test jackrabbit apache, apache is great"
350 | "this is a text that is much larger than the first one and only contains the word apache once."
Another article that is inline with the jcr documentation is:
https://stackoverflow.com/questions/30885219/understanding-apache-lucenes-scoring-algorithm
Scoring calculation is something really complex. Here, you have to begin with the primal equation:
score(q,d) = coord(q,d) · queryNorm(q) · ∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) )
To translate this into non-geek: the score depends, between others, on the frequency of the search term and on the boost factor assigned
Checklists
Attachments
Issue Links
- is related to
-
MGNLPER-178 DOC: What is the default now
-
- Closed
-