I am working on a project to convert multiple PDF files into basic HTML to put onto a site. I want to extract the text and the font sizes from the PDF to parse directly into HTML tags.
I have tried using pdfplumber however I am having trouble getting the font sizes to match up with the text so I am wondering if there is a simple method using pdfplumber or if there is another library that can achieve this.
You can use pdfminer.six (Python 3 compatible version of pdfminer)
I am having trouble with coming up a code that works on a pdf on my pc that will also work on your pdf that I havent seen. I will include code If I can take a look at your pdf