I'm realising HTTP scraping for multiple hashes. Though I know scraping is not widely used by clients (the only one I know is Transmission, tell me if you know more), what could be a reasonable limit of info-hashes for the scraper to process and response?
For an UDP tracker it's 74 at a time.
Currently my idea is 10.
What should be the optimal cache ttl for scrape responses before hitting db for renewed statistics?
While scraping do we have to give full v2 info hashes in bencoded files array or truncated ones?
P.S. Also is it a good idea to record peer's listening port instead of the one which is being sent in &port GET parameter to help some peers behind a NAT?
Accepting and processing a TCP/HTTP connection already costs some resources. So it's reasonable to support as infohashes as long as it only adds marginal costs on top of that. How much that is depends on your tracker implementation. If you're keeping all the data in efficient in-memory data structures (since peer lists are mostly ephemeral anyway) and don't need to query an external database it should be possible to look that up in microseconds.
Benchmark your tracker and find a point where the additional cost per infohash is still small compared to the cost of a connection.
As I wrote above, you really shouldn't be hitting a DB at all. But if you do anyway then you probably want to adjust the TTL dynamically based on swarm behavior. A new torrent will likely be a lot more dynamic than one that has existed for a few days.
A tracker should be oblivious to torrents. All it sees is hashes, ips and ports. So it won't know anything about v1 or v2 which means it has to use the truncated hashes.
libtorrent supports scraping so I'd expect any client based on it supporting it too.
This doesn't seem relevant at all to scrapes.