Rust chromiumoxide library won't render iframe element

79 Views Asked by At

I'm building a tauri app that does some web scraping. Tha problem I am facing is that, when the page contains an iframe that completly halts the page load.

I tried to visit a web page that contains an iframe, I expected the page to load properly, but the loading was halted instead.

The request in the src attribute is shown as "Pending" in the devtools, and stays like that until an eventual timeout.

I could somewhat fix that by using request interception to block the iframe's request, but I need the iframe to actually render, so that won't work.

Here is a minimal sample of the problem:

use chromiumoxide::{
    Browser, BrowserConfig, BrowserFetcher, BrowserFetcherOptions,
};
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let download_path = std::path::Path::new(".\\download");
    let _ = std::fs::create_dir_all(download_path);

    let fetcher_config = BrowserFetcherOptions::builder()
        .with_path(download_path)
        .build()?;

    let fetcher = BrowserFetcher::new(fetcher_config);

    let info = fetcher.fetch().await?;

    let config = BrowserConfig::builder()
        .chrome_executable(&info.executable_path)
        .user_data_dir("./data-dir")
        .with_head()
        .build()?;

    let (mut browser, mut handler) = Browser::launch(config).await.unwrap();

    let handle = tokio::spawn(async move {
        while let Some(h) = handler.next().await {
            if h.is_err() {
                break;
            }
        }
    });

    let page = browser.new_page("about:blank").await?;

    // Codepen may try to show you a captcha instead if you run this enough,
    // but since the captcha has an iframe, you will still see the problem happening
    page.goto("https://codepen.io/IanLintner/pen/DqGKQZ")
        .await?;

    // For you to be able to see the page loading
    tokio::time::sleep(std::time::Duration::from_secs(30)).await;
    _ = browser.close().await;
    _ = browser.wait().await;

    _ = handle.await;

    Ok(())
}

I do also know that the headless_chrome crate doesn't have this issue, but it is way too slow for my purposes (selecting td elements from a small table takes about 45 seconds).

I really need to get it to work with chromiumoxide if at all possible

0

There are 0 best solutions below