I have a table lucene_try which looks like this:
id || title || comment
1 || Title 1 || Sentence 1 of Comment 1. Sentence 2 of Comment 1.
2 || Title 1 || Sentence 1 of Comment 2. Sentence 2 of Comment 2.
What I am trying to achieve is for each title, and each comment of the title, break the comment into different sentences and put it in a different table which must look like this in lucene_rs:
id || title || root_comment || sub_comment
1 || Title 1 || Comment 1 || Sentence 1
2 || Title 1 || Comment 1 || Sentence 2
3 || Title 1 || Comment 2 || Sentence 1
4 || Title 1 || Comment 2 || Sentence 2
I have written the code for this, and it works properly when I print it on the console/terminal. It prints Sentence 1 and Sentence 2 of Comment 1 as well as Comment 2. However, when I want to insert this data, it only prints and inserts Sentence 1 and Sentence 2 of Comment 1.
Here is my code:
import pymysql
import pymysql.cursors
import random
from bs4 import BeautifulSoup
import nltk
from nltk import tokenize
import pdb
conn = pymysql.connect(host='localhost', user='root', password='password', db='master_thesis', autocommit=True)
cursor = conn.cursor()
cursor.execute("SELECT * FROM lucene_counter WHERE count > 5 AND count <= 30")
lucene_rs_list = list()
for row in cursor:
lucene_rs_list.append(row[1])
random.shuffle(lucene_rs_list)
final_list = lucene_rs_list[:1]
for i in range(len(final_list)):
current_title = final_list[i]
query = "SELECT title, comment FROM lucene_try WHERE title = %s"
cursor.execute(query, final_list[i])
for row in cursor:
root_comment = BeautifulSoup(row[1], "lxml").text
print("Root Title: ", current_title)
print("Root Comment: ", root_comment)
cleancomment = tokenize.sent_tokenize(root_comment)
for j in range(len(cleancomment)):
# THIS LINE PRINTS EVERYTHING PROPERLY WITH ALL THE COMMENTS AND SUBCOMMENTS IF CURSOR.EXECUTE IS COMMENTED OUT
print("Sub Comment: ", cleancomment[j])
# IF THE CURSOR.EXECUTE IS UNCOMMENTED, IT ONLY DISPLAYS RESULT OF THE FIRST ROOT_COMMENT AND NOT ALL
cursor.execute("""INSERT INTO lucene_rs (title, root_comment, comment) VALUES ("%s", "%s", "%s")""" % (current_title, root_comment, cleancomment[j]))
print("\n")
conn.close()
The problem is that you're using the same cursor to execute the
INSERTquery as you're using to get the results of theSELECTquery. When you execute theINSERT, it no longer contains the results of theSELECT, so the next iteration of the loop stops.Either read all the results of the
SELECTinto a list and then loop over that, or use a different cursor for theINSERT.