How to select queries for text search benchmark?

22 Views Asked by Tim At 06 March 2024 at 15:33

I have written a text search (exact text search) algorithm which I now want to evaluate. I had a look at two research papers in that topic, but I didn't find the exact way they choose the pattern to search for.

I found The Canterbury Corpus, which is used for benchmarking. I appreciate other suggestions. How to select the patterns/queries to search for? Selection by hand seems tedious, I could generate random numbers, interpret as index and take a substring starting at the index.

Are there other/better ways to do this?

Original Q&A

There are 1 best solutions below

Iulia Feroli On 06 March 2024 at 15:37

You could take a look at the BEIR project for benchmarking: https://github.com/beir-cellar/beir

This goes into more detail in the types of tests you can run. And I believe one of the most popular datasets for evaluations is MS Marco: https://microsoft.github.io/msmarco/ They have some clearly defined retrieval tasks you can adapt for your case.

There's a leaderboard available for the best ranking search engines, so it can give you some idea of where you stand.

Hope this helps!

How to select queries for text search benchmark?

There are 1 best solutions below

Related Questions in ALGORITHM

Related Questions in PERFORMANCE

Related Questions in FULL-TEXT-SEARCH

Trending Questions

Popular # Hahtags

Popular Questions