newspaper3k - get articles from HTML instead of URL

598 Views Asked by At

I'm using newspaper3k inside Scrapy parse method. I want to extract links but I don't want to fetch the website again.

Is it possible to use this:

newspaper.build(..)

with plain html so I can call .articles than?

1

There are 1 best solutions below

0
Dmitrii K On

I found this solution:

import httpx

from newspaper import Article

async def get_article(url):
    with httpx.AsyncClient() as client:
        response = await client.get(url)

    article = Article(url)
    article.set_html(response.text)
    article.parse()