I'm trying to log into a website Using cUrl and scrape certain data from the site. It's a homework project. But the site has 3 different form data that changes every time I log in.

Is it possible to bypass that and log in or is it just not possible? If so, can someone please get me started in the right direction?

The cURL code I've tried is:

<?php
include("simple_html_dom.php");

$cofile = dirname(__FILE__).'/cookie.txt';
$postfield= array(

 "SM"=>"UpPnlLogin|btnLogin",

  "__LASTFOCUS"=>"",
  "__EVENTTARGET"=>"btnLogin",

  "__EVENTARGUMENT"=>"",

  "__VIEWSTATE"=>"hly8ipIDyvfEpBj01vjkB/HmrA
  yIw+UuyvBkGc5NHMexWF+PvAVQZYkSrcwJM4rO9aaz
  93ogQuFxowVMDPueJz5DU3obstDtyl7KuLvZXQ+GJ1
  JKRGEtTTRl5vM2RIi7mwL+j3LRqHgl+ZW1wftsnt2q
  nUy7rrxSC6j0eoqabUM/hpS1hveORvLcEbo+5o1J+r
  W0+UYYnZ/cFQcUNhx5538uRaD8PIxq6GxTrT/qI2ef
  DDLJB5qmmANILYPxsVg++dXFmQFD59MvETq+R3Om0g
  ==",

  "__VIEWSTATEGENERATOR"=>"CADA6983",

  "__EVENTVALIDATION"=>"y2iWoj4pBfE6Ij55U/Hf
  Sq/mWPNVk4Hv4Nvg7IDxuN6KElLeNsq4iUIbHMfGQS
  8s6oProuk3wXUrqQWG6VleouPj+M3LLkKYR8XhLzmw
  e4Cck3tqa/YpGmNLZiNOLkbN4/RhPFq+onAiQ2GDc4
  gHlU5aU94WwONQ9ItyzsH4V111bPhKX3gjr9YXhpPg
  9UiyWwkNXohLJSWRM9jGfHrgMg==",

  "txtCustNo"=>"username",

  "txtPassword"=>"password",

  "__ASYNCPOST"=>"true",

  "btnLogin"=>"Нэвтрэх"

  );

$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cofile);
curl_setopt($ch, CURLOPT_URL,"https://e.khanbank.com/");//url that is 
requested when logging in
curl_setopt($ch, 
CURLOPT_REFERER,"https://e.khanbank.com/");//CURLOPT_REFERER
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postfield));


ob_start();      // prevent any output
curl_exec ($ch); // execute the curl command
ob_end_clean();  // stop preventing output

curl_close ($ch);
unset($ch);

$ch = curl_init();
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cofile);
curl_setopt($ch, CURLOPT_URL,"https://e.khanbank.com/pageMain? 
content=ucMain_Welcome");

$result = curl_exec ($ch);

curl_close ($ch);

echo $result;

?>
1

There are 1 best solutions below

1
hanshenrik On

you can't hardcode the values, they change for every login, and they're tied to your cookie session, meaning the EVENTVALIDATION that you get from your browser is tied to your browser's cookie session, and is not valid for curl.

i'll write an example with the hhb_curl library,

first add this function somewhere, you'll need it (it makes DOMDocument load HTML with utf-8 characterset, which is not the default for DOMDocument, but utf-8 is used by khanbank),

function my_dom_loader(string $html): \DOMDocument
{
    $html = trim($html);
    if (empty($html)) {
        //....
    }
    if (false === stripos($html, '<?xml encoding=')) {
        $html = '<?xml encoding="UTF-8">' . $html;
    }
    $ret = new DOMDocument('', 'UTF-8');
    $ret->preserveWhiteSpace = false;
    $ret->formatOutput = true;
    if (!(@$ret->loadHTML($html, LIBXML_NOBLANKS | LIBXML_NONET | LIBXML_BIGLINES))) {
        throw new \Exception("failed to create DOMDocument from input html!");
    }
    $ret->preserveWhiteSpace = false;
    $ret->formatOutput = true;
    return $ret;
}

first create the hhb_curl handle,

<?php
declare (strict_types = 1);
require_once('hhb_.inc.php');
$hc = new hhb_curl('', true);

now, khanbank.com use a browser-white-list, if you're not using a whitelisted browser, you cannot log in. an example of a whitelisted browser is Google Chrome 75 X64, so impersonate that browser by setting

$hc->setopt(CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36');

next fetch the login page to get the cookie and the EVENTVALIDATION stuff,

$html = $hc->exec('https://e.khanbank.com/')->getStdOut();

now we got the EVENTVALIDATION stuff in html, and we need to parse it out from the html,

$domd = my_dom_loader($html);
$xp = new DOMXPath($domd);
$form = $domd->getElementById("Form1");
$post_data = array();
foreach ($form->getElementsByTagName("input") as $input) {
    $post_data[$input->getAttribute("name")] = $input->getAttribute("value");
}
assert(isset($post_data['txtCustNo']), "ERROR: COULD NOT FIND USERNAME INPUT!");
assert(isset($post_data['txtPassword']), "ERROR: COULD NOT FIND PASSWORD INPUT!");

now $post_data contains:

array (
  '__VIEWSTATE' => '9GT5O4HrKQJrWbF7PRSXu9RiMlpkqY5hO+sN9H0OXxmwYjWMfr2uf4yIgpHtk9sp56RWot30dvKeuGF3+eoOhpNu5nsuGBjtrpb8g8AGMaDbQ0nxpEKS3HILkqccMwFfn7y0LThLfjm0Ow84RGosJa+/5iM9YfP/HFM5HnyHKGJkM84nGEh7QZfoGYwMOU9SSb5dKmxfnmrIo/xXUUh4DT8+LOFGCQ2H5+nPFudTonwfgX6AKBNhkRijlfrUY+ns7HMq699AU38bsaxgD67KEw==',
  '__VIEWSTATEGENERATOR' => 'CADA6983',
  '__EVENTVALIDATION' => '4FZipDfTouUXBNMfIqlf/SXhPNyW5SBkcH/JIZB/j8kdaJUlMAQzvodpEq2n6WBRvxs6IBGVASOFouDQbqjygKK8+01KbRa9CpEGRiYGdxSIlt0wbZ2wJZeN6kB2ncn2DSd3C3nymCcz1kGHIdR3Dy5l2OlS6JngVCVoXuhpDzsjDQbrRwHST85XOlXdF6jl8/aQPYkSlZkSRQ5BFzdbnw==',
  'txtCustNo' => '',
  'txtPassword' => '',
  'chkRemUser' => '',
)

these are tied to this specific cookie session, so you must parse them out of the html every time, you cannot hardcode it, but there are still some variables missing (because they are set with javascript, not with HTML), so add those:

$post_data['SM'] = 'UpPnlLogin|btnLogin';
$post_data['__LASTFOCUS'] = '';
$post_data['__EVENTARGUMENT'] = '';
$post_data['__EVENTTARGET'] = 'btnLogin';
$post_data['__ASYNCPOST'] = 'true';

now setting the username and password:

$post_data['txtCustNo'] = "username";
$post_data['txtPassword'] = "password";

and finally to send the actual login request:

$html = $hc->setopt_array(array(
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => http_build_query($post_data),
    CURLOPT_URL => 'https://e.khanbank.com/'
))->exec()->getStdOut();

and finally-finally: check for login errors:

$domd = my_dom_loader($html);
$xp = new DOMXPath($domd);
$login_errors = array();
//uk-alert uk-alert-warning

foreach ($xp->query("//*[contains(@class,'alert')]") as $login_error) {
    $login_error = trim($login_error->textContent);
    if (!empty($login_error)) {
        $login_errors[] = $login_error;
    }
}
if (!empty($login_errors)) {
    var_dump($login_errors);
    throw new \RuntimeException("login errors: " . json_encode($login_errors, JSON_PRETTY_PRINT));
}
echo "logged in successfully! :)";

which yields:

$ php wtf4.php
array(1) {
  [0]=>
  string(69) "Нэвтрэх нэр эсвэл нууц үг буруу байна!"
}
PHP Fatal error:  Uncaught RuntimeException: login errors: [
    "\u041d\u044d\u0432\u0442\u0440\u044d\u0445 \u043d\u044d\u0440 \u044d\u0441\u0432\u044d\u043b \u043d\u0443\u0443\u0446 \u04af\u0433 \u0431\u0443\u0440\u0443\u0443 \u0431\u0430\u0439\u043d\u0430!"
] in /cygdrive/c/projects/misc/wtf4.php:63
Stack trace:
#0 {main}
  thrown in /cygdrive/c/projects/misc/wtf4.php on line 63
  • because "username" and "password" is not valid login credentials. also the weird \u0431\u0430\u0439\u043d\u0430 stuff is because PHP's Exception message does not support unicode characters, it seems, and the error message is written in unicode characters (russian maybe?)