I try to load my data from a CSV file using the code below. For some reason it isn't working correctly, because it only load the last loop variables values...
import csv
import newspaper
import pandas as pd
from newspaper import Article
df = pd.DataFrame(data, columns=['txt','date1','authors1'])
lista = ['https://www.dawn.com/news/1643189','https://www.dawn.com/news/1648926/former-pakistan-captain-inzamamul-haq-suffers-heart-attack-in-lahore']
for list in lista:
first_article = Article(url="%s" % list, language='de')
first_article.download()
first_article.parse()
txt = first_article.text
date1 = first_article.publish_date
authors1 = first_article.authors
data = [[txt,date1,authors1]]
data = [[txt,date1,authors1]]
df = pd.DataFrame(data, columns=['txt','date1','authors1'])
df.to_csv('pagedata.csv')
This is by design! You are overwriting your output. Try something like:
edit: apparently you want to collate your data. Something like:
The main thing is having a variable outside the loop you can append to.
Note that I have corrected the language (these articles are in English, not German), and removed a redundant string replace (
url="%s" % url == url=url).