I want to crawl data from a website. all I can do is to search for data from the site using a keyword, and collect results. The site only returns maximum 100 records in each request, and paging is limited at page 1250. Which means I can only get 125000 records in total for 1 searched keyword, and there are many records that are not returned. The total number of records according to the API is about 5 million. I noticed that the set of returned records are the same for same keyword & page number, it's not like Google where you search the same keyword and it returns different results every time.
I want to find a strategy to prepare a list of keywords that I can use to maximize number of records that I can collect. Any suggestion?
crawl data from website