There is a site\resource that offers some general statistic information as well as an interface to search facilities. This search operations are costly, so I want to restrict frequent and continuous (i.e. automatic) search requests (from people, not from search engines).
I believe there are many existing techniques and frameworks that perform some intelligence grabbing protection, so I don't have to reinvent a wheel. I'm using Python and Apache through mod_wsgi.
I am aware of mod_evasive (will try to use it), but I'm also interested in any other techniques.
You could try a robots.txt file. I believe you just put it at the root of your application, but that website should have more details. The
Disallow
syntax is what you're looking for.Of course, not all robots respect it, but they all should. All the big companies (Google, Yahoo, etc.) will.
You may also be interested in this question about disallowing dynamic URLs.