Consider the following article: https://arxiv.org/pdf/2106.13823.pdf
It is an academic paper formatted in two columns.
I want to extract text from a two-column PDF in the natural reading order: first down the left column, then down the right. Does PyPDF PyPDF2 do this by default?




end of page 5, start of page 6:
Conclusion: This document is not suitable for conversion to TEXT, because it contains formula's which cannot be (correctly) converted to text.
Code used (based on: https://stackoverflow.com/a/63518022/724039 ):
Better results are using LibreOffice Writer (version: 24.2.0.3)