I am trying to extract tables and equations as text from Rtf embedded objects. The data is extracted only if I double click the embedded object in word or other app to make the table or equation editable and then save as the file. It doesn't matter if I save it a rtf or docx, the data will only be converted if I do like this. Does anyone know a method in Python to overcome this issue, maybe to make the document editable by double clicking or to save it in a way that the objects become readable?
If I convert the rtf to text with Pypandoc module without doing the above step the text file output is empty, or if I first convert to docx with comtypes and then try to extract the table data with docx module the doc.tables data is also empty. In both cases the tables and equations are not converted if I do not do the editable double click . For example if I add some text in the document the text is rendered without doing this step.
docfile = 'doc21.docx'
output = pypandoc.convert_file(docfile, 'plain', outputfile="table8.txt")
The table looks like this