I am scraping the data from tweeter using Twython. I could get this done successfully. However, for further data manipulation, I need to save the tweeter data to JSON or any other format that can be opened with pandas.
I want to include every single column from the scraping result, including language location, retweets and so on. I know how to do this for a few columns, but I could not find the information about how to include all of them.
import json
credentials = {}
credentials['CONSUMER_KEY'] = '...'
credentials['CONSUMER_SECRET'] = '...'
credentials['ACCESS_TOKEN'] = '...'
credentials['ACCESS_SECRET'] = '...'
# Save the credentials object to file
with open("twitter_credentials.json", "w") as file:
json.dump(credentials, file)
# Import the Twython class
from twython import Twython
import json
# Load credentials from json file
with open("twitter_credentials.json", "r") as file:
creds = json.load(file)
# Instantiate an object
python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])
python_tweets.search(q='#python', result_type='popular',count=5)
OUTPUT:
{'statuses': [{'created_at': 'Mon Dec 14 04:05:03 +0000 2020',
'id': 1338334158205169664,
'id_str': '1338334158205169664',
'text': ' Hmmm...this looks right, doesn’t it? We’ll give you a hint - the result is meant to be 36!\n\nCan you find the err… ',
'truncated': True,
'entities': {'hashtags': [],
'symbols': [],
'user_mentions': [],
'urls': [{'url': '',
'expanded_url': '',
'display_url': 'twitter.com/i/web/status/1…',
'indices': [117, 140]}]},
'metadata': {'result_type': 'popular', 'iso_language_code': 'en'},
'source': '<a href=">',
'in_reply_to_status_id': None,
'in_reply_to_status_id_str': None,
'in_reply_to_user_id': None,
'in_reply_to_user_id_str': None,
and so on
My question is: how can I save the data I got from tweeter into json format so I can open it lately with pandas. I basically just want to open it with pandas somehow.
I have tried the following codes:
data= {}
data[python_tweets.search(q='#python', result_type='popular',count=5)]
with open("twitter_new.json", "w") as file:
json.dump(data, file)
TypeError: unhashable type: 'dict'
data=python_tweets.search(q='#python', result_type='popular',count=5)
df = pd.DataFrame(data)
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
To save results from
search()you should simply assing to variabledata = ...and save itBut this JSON has complex structure - it has two different subdictionares
data['statuses']anddata['search_metadata']which can't be converted together to oneDataFrame. But probably you need only values fromdata['statuses'](even without saving in file)Result:
Minimal working code which I used to test it
BTW:
Your dictionary
data = {}could be useful if you would like to keep many resultsand save it in separated
JSONfilesor open in separated
DataFramesMinimal working code which I used to test it