I'm trying to log into a website Using cUrl and scrape certain data from the site. It's a homework project. But the site has 3 different form data that changes every time I log in.
Is it possible to bypass that and log in or is it just not possible? If so, can someone please get me started in the right direction?
The cURL code I've tried is:
<?php
include("simple_html_dom.php");
$cofile = dirname(__FILE__).'/cookie.txt';
$postfield= array(
"SM"=>"UpPnlLogin|btnLogin",
"__LASTFOCUS"=>"",
"__EVENTTARGET"=>"btnLogin",
"__EVENTARGUMENT"=>"",
"__VIEWSTATE"=>"hly8ipIDyvfEpBj01vjkB/HmrA
yIw+UuyvBkGc5NHMexWF+PvAVQZYkSrcwJM4rO9aaz
93ogQuFxowVMDPueJz5DU3obstDtyl7KuLvZXQ+GJ1
JKRGEtTTRl5vM2RIi7mwL+j3LRqHgl+ZW1wftsnt2q
nUy7rrxSC6j0eoqabUM/hpS1hveORvLcEbo+5o1J+r
W0+UYYnZ/cFQcUNhx5538uRaD8PIxq6GxTrT/qI2ef
DDLJB5qmmANILYPxsVg++dXFmQFD59MvETq+R3Om0g
==",
"__VIEWSTATEGENERATOR"=>"CADA6983",
"__EVENTVALIDATION"=>"y2iWoj4pBfE6Ij55U/Hf
Sq/mWPNVk4Hv4Nvg7IDxuN6KElLeNsq4iUIbHMfGQS
8s6oProuk3wXUrqQWG6VleouPj+M3LLkKYR8XhLzmw
e4Cck3tqa/YpGmNLZiNOLkbN4/RhPFq+onAiQ2GDc4
gHlU5aU94WwONQ9ItyzsH4V111bPhKX3gjr9YXhpPg
9UiyWwkNXohLJSWRM9jGfHrgMg==",
"txtCustNo"=>"username",
"txtPassword"=>"password",
"__ASYNCPOST"=>"true",
"btnLogin"=>"Нэвтрэх"
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cofile);
curl_setopt($ch, CURLOPT_URL,"https://e.khanbank.com/");//url that is
requested when logging in
curl_setopt($ch,
CURLOPT_REFERER,"https://e.khanbank.com/");//CURLOPT_REFERER
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postfield));
ob_start(); // prevent any output
curl_exec ($ch); // execute the curl command
ob_end_clean(); // stop preventing output
curl_close ($ch);
unset($ch);
$ch = curl_init();
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cofile);
curl_setopt($ch, CURLOPT_URL,"https://e.khanbank.com/pageMain?
content=ucMain_Welcome");
$result = curl_exec ($ch);
curl_close ($ch);
echo $result;
?>
you can't hardcode the values, they change for every login, and they're tied to your cookie session, meaning the EVENTVALIDATION that you get from your browser is tied to your browser's cookie session, and is not valid for curl.
i'll write an example with the hhb_curl library,
first add this function somewhere, you'll need it (it makes DOMDocument load HTML with utf-8 characterset, which is not the default for DOMDocument, but utf-8 is used by khanbank),
first create the hhb_curl handle,
now, khanbank.com use a browser-white-list, if you're not using a whitelisted browser, you cannot log in. an example of a whitelisted browser is Google Chrome 75 X64, so impersonate that browser by setting
next fetch the login page to get the cookie and the EVENTVALIDATION stuff,
now we got the EVENTVALIDATION stuff in html, and we need to parse it out from the html,
now
$post_datacontains:these are tied to this specific cookie session, so you must parse them out of the html every time, you cannot hardcode it, but there are still some variables missing (because they are set with javascript, not with HTML), so add those:
now setting the username and password:
and finally to send the actual login request:
and finally-finally: check for login errors:
which yields:
\u0431\u0430\u0439\u043d\u0430stuff is because PHP's Exception message does not support unicode characters, it seems, and the error message is written in unicode characters (russian maybe?)