I need to get articles/news from a html file and the best solution i found is to use newspaper3k in python. I am getting a blank result, i've tried a lot of solutions but i am a kind of stuck here.
from newspaper import Article
with open("index.html", 'r', encoding='utf-8') as f:
article = Article('', language='en')
article.download(input_html=f.read())
article.parse()
print(article.title)
Results: ''
It should be print a text from an article tag inside of a html file.
Your code looks right.
I'm going to assume the problem is your source. What is in
index.html? Can you provide me the this file or the URL that it was extracted from?BTW Here is the code sample for reading offline content with
newspaper3k. This sample is from my overview document on usingnewspaper3k.