How to modify the following command lines from a python file to convert .pdf to .txt file?

31 Views Asked by bestofthebeast At 02 October 2023 at 07:09

I took from the web certain command lines into a python file. I then put the file in a folder, together with the PDF and then I tried to convert it to .txt using command prompt. However, these command lines only extract the text, and cmd is too small to contain all these characters. Sometimes it could be 300 pages long. I need to convert it to .txt.

Anyway, here're the commands:

import pdf2image
try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract


def pdf_to_img(pdf_file):
    return pdf2image.convert_from_path(pdf_file)


def ocr_core(file):
    text = pytesseract.image_to_string(file, lang='eng')
    return text


def print_pages(pdf_file):
    images = pdf_to_img(pdf_file)
    for pg, img in enumerate(images):
        print(ocr_core(img))


print_pages('1.pdf')

I modified the title of the pdf.

I tried to find youtube tutorial videos, but without much success. I was expecting a video with the title "How to convert pdf to txt with tesseract" or something similar.

Original Q&A

How to modify the following command lines from a python file to convert .pdf to .txt file?

There are 0 best solutions below

Related Questions in OCR

Related Questions in TESSERACT

Related Questions in TXT

Related Questions in FILE-CONVERSION

Trending Questions

Popular # Hahtags

Popular Questions