cant take text from docx file by python because of "strange blocks"

49 Views Asked by At

enter image description here

I'm trying to get the text out of a Word document and I can't do it, at some point I looked at the document and saw that I had the text in some strange "blocks" and there was no information on how to work with it, help

from docx import Document

doc = Document('XXXXXXXXXXXXXXXX.docx')
for para in doc.paragraphs:
    print(para.text)
1

There are 1 best solutions below

0
Sreehari Rajeev On BEST ANSWER

Try using other libraries such as "python-docx2txt" or "textract". Because docx might not support all the formatting in the Word document.