I load my variable to the dataframe using loop but it only print last varibles store in data all others variable are discarded

355 Views Asked by At

I try to load my data from a CSV file using the code below. For some reason it isn't working correctly, because it only load the last loop variables values...

import csv
import newspaper
import pandas as pd
from newspaper import Article
    
df = pd.DataFrame(data, columns=['txt','date1','authors1'])
lista = ['https://www.dawn.com/news/1643189','https://www.dawn.com/news/1648926/former-pakistan-captain-inzamamul-haq-suffers-heart-attack-in-lahore']
    
for list in lista:
    
    first_article = Article(url="%s" % list, language='de')
    first_article.download()
    first_article.parse()
    txt = first_article.text
    date1 = first_article.publish_date
    authors1 = first_article.authors
    data = [[txt,date1,authors1]]
    data = [[txt,date1,authors1]]
    df = pd.DataFrame(data, columns=['txt','date1','authors1'])
    df.to_csv('pagedata.csv')
1

There are 1 best solutions below

5
2e0byo On

This is by design! You are overwriting your output. Try something like:

f = pd.DataFrame(data, columns=['txt','date1','authors1'])
lista = ['https://www.dawn.com/news/1643189','https://www.dawn.com/news/1648926/former-pakistan-captain-inzamamul-haq-suffers-heart-attack-in-lahore']

for i, list in enumerate(lista):

   first_article = Article(url="%s" % list, language='de')
   first_article.download()
   first_article.parse()
   txt = first_article.text
   date1 = first_article.publish_date
   authors1 = first_article.authors
   data = [[txt,date1,authors1]]
   df = pd.DataFrame(data, columns=['txt','date1','authors1'])
   df.to_csv(f"page_{i}_data.csv2")

edit: apparently you want to collate your data. Something like:

df = pd.DataFrame(columns=['txt','date1','authors1'])
for row in lista:

   first_article = Article(url=row, language='en')
   first_article.download()
   first_article.parse()
   txt = first_article.text
   date1 = first_article.publish_date
   author1 = first_article.authors
   df.loc[len(df.index)] = [txt, date1, authors1]


df.to_csv("pagedata.csv2")

The main thing is having a variable outside the loop you can append to.

Note that I have corrected the language (these articles are in English, not German), and removed a redundant string replace (url="%s" % url == url=url).