I'm trying to find how MarkLogic calculates relevancy score. MarkLogic support pointed me to a knowledge base article (link in reference) where I saw the below formula (natural log).
log(1/term frequency) * log(1/document frequency)
When I apply this formula to my usecase, the formula is always returning a negative value for me. Could anyone provide the final score calculated using the above formula for the below use case?
DB has 350k documents
Document (first result) has 500 words/terms
Document has 5 term matches
DB has 513 documents that matches with the search-term
The formulas for relevance scores are documented in the MarkLogic Search Guide:
It seems that the Knowledgebase Article shows the formula for
inverse document frequencywhen discussinglogtfidf, which might be a little confusing. The intent was to introduce and explainterm frequency normalizationand the options that are available to customize the score calculation beyond just thelogtfidforinverse document frequencycalculation.With
term frequency normalizationyou can influence the relevance score with the term frequency normalization setting, which takes into account the size of the document and the "density" of the terms relative to other documents in the database: