Attaching DOM nodes from multiple documents in the same crawler is forbidden

207 Views Asked by At

I'm making a web crawler, but I'm getting an error because I can't use more than one dom element. I think I need to manipulate the dom element, but I have no idea how to do it.

Im using Symfony DomCrawler and Sunra PhpSimple HtmlDomParser

Code:


$crawler = $this->crawler;
$crawler->addHtmlContent(HtmlDomParser::file_get_html($url, false, null, 0));

// Getting the URL data
$crawler
    ->filter('a')
    ->each(function (crawler $node) use ($url): void {
        $url_fr_hrf = $node->attr('href');
        if(str_starts_with($url_fr_hrf, '/') OR str_starts_with($url_fr_hrf, '#')): $url_fr_hrf = $url . $node->attr('href'); endif;
        $this->datas = [
            'url' => $url_fr_hrf,
        ];
        // Checking Urls

        if(substr_count($this->datas['url'], '/') > 4 && parse_url($this->datas['url'], PHP_URL_HOST) === parse_url($url, PHP_URL_HOST)):
            // Not searcing for the under links
        else:
            $check = $this->db->db->prepare("SELECT * FROM crawler WHERE url = ?");
            $check->execute([$this->datas['url']]);
            $check_f = $check->fetch(PDO::FETCH_ASSOC);
            if($check_f['url'] === $this->datas['url']):
                // Url already exists
            else:
                $insert = $this->db->db->prepare("INSERT INTO crawler SET url = ?");
                $insert->execute([$this->datas['url']]);
        endif; endif;


        $this->url = $this->datas['url'];
        sleep(0.5);
});
//echo $url . PHP_EOL;

$ins = $this->db->db->prepare("SELECT * FROM crawler"); $ins->execute();
while ($links = $ins->fetch(PDO::FETCH_ASSOC)):
    $this->request($links['url']);
endwhile;

Error: Uncaught InvalidArgumentException: Attaching DOM nodes from multiple documents in the same crawler is forbidden. in...

Please help me solve this error

0

There are 0 best solutions below