Convert PDF to HTML using python and pdfkit

19.5k Views Asked by At

On this site Adobe write about conversion from pdf to html using pdfkit

They use pdfkit.from_pdf(...) method.

This script uses the ‘pdfkit’ library to convert the PDF file to HTML. To use this script, you will need to install the ‘pdfkit’ library...

When I want to use this method I have error

Traceback (most recent call last):
  File "C:\TestPdfToHtml\script.py", line 7, in <module>
    html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
                ^^^^^^^^^^^^^^^
AttributeError: module 'pdfkit' has no attribute 'from_pdf'. Did you mean: 'from_url'?

How can I resolve this problem?

Below is the full script

import pdfkit
# Read the PDF file
pdf_file = open('test2.pdf', 'rb')
# Convert the PDF to HTML
html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
# Close the PDF file
pdf_file.close()
1

There are 1 best solutions below

2
PforPython On

Maybe the newer version of pdfkit does not support pdfkit.from_pdf. You can try pdfkit.from_file()

pdfkit.from_file(pdf_file, html_file)

Hope this helps.