I am trying to create a python script with PRAW (python reddit wrapper) that will pull the title, submission date/time, number of comments, URL, and comments for any submission that contains a specific keyword.
The script will successfully output a csv file with the title, time, # of comments, and URL. However, I cannot seem to get it to grab the comments from each of the posts.
This version of the script will only return a "comment forest" object (I think that's what it is).
import praw
import pandas as pd
reddit = praw.Reddit(
client_id="xxxxxxxxxxx",
client_secret="xxxxxxxxxxxxxx",
password="xxxxxxxxxxxxxx",
user_agent="xxxxxxxxxxxxxx",
username="xxxxxxxxxxxxx",
)
#search a subreddit for submissions that have a specific word. uses pandas to output data as a csv file.
title_list = []
date_list = []
num_comments_list = []
url_list = []
comments_all = []
for submission in reddit.subreddit("ChatGPT").search("plagiarism"):
title_list.append(submission.title)
date_list.append(submission.created_utc)
num_comments_list.append(submission.num_comments)
url_list.append(submission.url)
comments_all.append(submission.comments)
df = pd.DataFrame({'Title': title_list,
'Time': date_list,
'# of comments': num_comments_list,
'URL': url_list,
'comments': comments_all
})
df.to_csv('plagiarism_search.csv', index = False)
And here is a screenshot of the csv output
I have successfully grabbed all of the comments from individual posts using the script below, but I'm not sure how to implement that with the above method of searching a specific subreddit with a keyterm.
url="https://www.reddit.com/r/ChatGPT/comments/13dzapn/is_using_chatgpt_to_rephrase_considered_as/"
submission = reddit.submission(url=url)
file = open("plagiarismcomments.txt", "w", encoding="utf-8")
#print all comments (except stickied) for the submission
submission.comments.replace_more(limit=None)
for comment in submission.comments.list():
if not comment.stickied:
all_comments=comment.body
file.write(all_comments)