This code isnt correctly pulling the information which is nested table within a table cell. How can I fix it?
I'm using Python 3 and pip install python-docx.
import os
import csv
from docx import Document
directory_path = "/"
file_pattern = "*.docx"
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['VS', 'PE', 'Measures of Success'])
for file in os.listdir(directory_path):
if file.endswith(".docx"):
doc = Document(os.path.join(directory_path, file))
pe_name = ""
forecasted_returns = ""
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
if "PE" in cell.text:
pe_name = cell.text
print(pe_name)
if "Describe how the success" in cell.text:
forecasted_returns = cell.text
print(forecasted_returns)
writer.writerow([file, pe_name, forecasted_returns])
print("Done!")
Your current approach will only capture the last occurrence of "PE" and "Describe how the success" within the document because you're overwriting
pe_nameandforecasted_returnsin each iteration.You should append each occurrence of "PE" and "Describe how the success" to a list, and then join the list elements to create a single string before writing to the CSV file.