Uploaded image for project: 'Ranker'
  1. Ranker
  2. MGNLRANK-4

[Shaping Ranking] What weighting does Lucene provide

    XMLWordPrintable

Details

    • Task
    • Resolution: Obsolete
    • Neutral
    • None
    • None
    • None

    Description

      • Discover What is weighting from the Lucene index ?
        • Provide a list, for internal documentation to further discuss

      Notes:

      Jackrabbit uses the default lucene algorithm to calculate the score for a jcr:contains clause. any other query element will usually return a  score of 1000.
      a quick test showed the following for the query:

      //*[jcr:contains(.,'apache')] order by @jcr:score descending
      jcr:score | text property
       ---------------------------------------------------------------------- 
      1000 | "Apache Jackrabbit"
       848 | "some test jackrabbit apache, apache is great"
       350 | "this is a text that is much larger than the first one and only contains the word apache once."

      Another article that is inline with the jcr documentation is:
      https://stackoverflow.com/questions/30885219/understanding-apache-lucenes-scoring-algorithm

      Scoring calculation is something really complex. Here, you have to begin with the primal equation:

      score(q,d) = coord(q,d) · queryNorm(q) · ∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) )

      To translate this into non-geek: the score depends, between others, on the frequency of the search term and on the boost factor assigned

      Checklists

        Acceptance criteria

        Attachments

          Issue Links

            Activity

              People

                Unassigned Unassigned
                ldelnevo Laura Delnevo
                AuthorX
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  Checklists

                    Task DoR