I am trying to write a program that does chemical search on https://echa.europa.eu/ and gets the result. The "Search for Chemicals" field is on the middle of the main webpage. I want to get the resulting URLs from doing search for each chemicals by providing the cas number (ex. 67-56-1). It seems that the URL I get does not include the cas number provided.
I tried inserting different cas number (71-23-8) into "p_p_id" field, but it didn't give expected search result.
https://echa.europa.eu/search-for-chemicals?p_p_id=71-23-8
I also examined the headers of GET methods requested from Chrome which also did not include the cas number.
Is the website using variables to store the input query? Is there a way or a tool that can be used to get the resulting URL including searching cas number?
Once I figure this out, I'll be using Python to get the data and save it as excel file.
Thanks.
You need to get the
JESSIONIDcookie by requesting the main url once then send a POST onhttps://echa.europa.eu/search-for-chemicals. But this needs also some required URL parametersUsing curl and bash :
Using python and scraping with beautifulsoup
Note that I've set the timestamp parameter (formDate param) in case of it's actually checked on the server