Maximize data returned - Strategy to crawl data from a website

22 Views Asked by News Entertainment At 07 January 2024 at 05:48

I want to crawl data from a website. all I can do is to search for data from the site using a keyword, and collect results. The site only returns maximum 100 records in each request, and paging is limited at page 1250. Which means I can only get 125000 records in total for 1 searched keyword, and there are many records that are not returned. The total number of records according to the API is about 5 million. I noticed that the set of returned records are the same for same keyword & page number, it's not like Google where you search the same keyword and it returns different results every time.

I want to find a strategy to prepare a list of keywords that I can use to maximize number of records that I can collect. Any suggestion?

crawl data from website

Original Q&A

Maximize data returned - Strategy to crawl data from a website

There are 0 best solutions below

Related Questions in WEB-CRAWLER

Related Questions in CRAWLER4J

Trending Questions

Popular # Hahtags

Popular Questions