What's the recommended limit for number of info-hashes allowed to be scraped?

68 Views Asked by At

I'm realising HTTP scraping for multiple hashes. Though I know scraping is not widely used by clients (the only one I know is Transmission, tell me if you know more), what could be a reasonable limit of info-hashes for the scraper to process and response?

For an UDP tracker it's 74 at a time.

Currently my idea is 10.

What should be the optimal cache ttl for scrape responses before hitting db for renewed statistics?

While scraping do we have to give full v2 info hashes in bencoded files array or truncated ones?

P.S. Also is it a good idea to record peer's listening port instead of the one which is being sent in &port GET parameter to help some peers behind a NAT?

1

There are 1 best solutions below

6
the8472 On

what could be a reasonable limit of info-hashes for the scraper to process and response?

Accepting and processing a TCP/HTTP connection already costs some resources. So it's reasonable to support as infohashes as long as it only adds marginal costs on top of that. How much that is depends on your tracker implementation. If you're keeping all the data in efficient in-memory data structures (since peer lists are mostly ephemeral anyway) and don't need to query an external database it should be possible to look that up in microseconds.

Benchmark your tracker and find a point where the additional cost per infohash is still small compared to the cost of a connection.

What should be the optimal cache ttl for scrape responses before hitting db for renewed statistics?

As I wrote above, you really shouldn't be hitting a DB at all. But if you do anyway then you probably want to adjust the TTL dynamically based on swarm behavior. A new torrent will likely be a lot more dynamic than one that has existed for a few days.

While scraping do we have to give full v2 info hashes in bencoded files array or truncated ones?

A tracker should be oblivious to torrents. All it sees is hashes, ips and ports. So it won't know anything about v1 or v2 which means it has to use the truncated hashes.

Though I know scraping is not widely used by clients (the only one I know is Transmission, tell me if you know more),

libtorrent supports scraping so I'd expect any client based on it supporting it too.

P.S. Also is it a good idea to record peer's listening port instead of the one which is being sent in &port GET parameter to help some peers behind a NAT?

This doesn't seem relevant at all to scrapes.