Im struggling to get my python working to pull information from a word document

45 Views Asked by At

This code isnt correctly pulling the information which is nested table within a table cell. How can I fix it?

I'm using Python 3 and pip install python-docx.

import os
import csv
from docx import Document

directory_path = "/"
file_pattern = "*.docx"

with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['VS', 'PE', 'Measures of Success'])

    for file in os.listdir(directory_path):
        if file.endswith(".docx"):

            doc = Document(os.path.join(directory_path, file))
        
            pe_name = ""
            forecasted_returns = ""
        
            for table in doc.tables:
                for row in table.rows:
                    for cell in row.cells:
                        if "PE" in cell.text:
                            pe_name = cell.text
                            print(pe_name)
                        if "Describe how the success" in cell.text:
                            forecasted_returns = cell.text
                            print(forecasted_returns)
                writer.writerow([file, pe_name, forecasted_returns])

print("Done!")
1

There are 1 best solutions below

0
TheHungryCub On

Your current approach will only capture the last occurrence of "PE" and "Describe how the success" within the document because you're overwriting pe_name and forecasted_returns in each iteration.

You should append each occurrence of "PE" and "Describe how the success" to a list, and then join the list elements to create a single string before writing to the CSV file.

            pe_name_list = []  # Store all occurrences of "PE"
            forecasted_returns_list = []  # Store all occurrences of "Describe how the success"
            
            for table in doc.tables:
                for row in table.rows:
                    for cell in row.cells:
                        if "PE" in cell.text:
                            pe_name_list.append(cell.text)
                        if "Describe how the success" in cell.text:
                            forecasted_returns_list.append(cell.text)
                            
            # Join multiple occurrences into a single string
            pe_name = '\n'.join(pe_name_list)
            forecasted_returns = '\n'.join(forecasted_returns_list)
            
            writer.writerow([file, pe_name, forecasted_returns])

print("Done!")