I have written a text search (exact text search) algorithm which I now want to evaluate. I had a look at two research papers in that topic, but I didn't find the exact way they choose the pattern to search for.
I found The Canterbury Corpus, which is used for benchmarking. I appreciate other suggestions. How to select the patterns/queries to search for? Selection by hand seems tedious, I could generate random numbers, interpret as index and take a substring starting at the index.
Are there other/better ways to do this?
You could take a look at the BEIR project for benchmarking: https://github.com/beir-cellar/beir
This goes into more detail in the types of tests you can run. And I believe one of the most popular datasets for evaluations is MS Marco: https://microsoft.github.io/msmarco/ They have some clearly defined retrieval tasks you can adapt for your case.
There's a leaderboard available for the best ranking search engines, so it can give you some idea of where you stand.
Hope this helps!