How to extract specific text from a pdf using python?

116 Views Asked by anonymous At 02 February 2023 at 19:56

These are the items which are needed to be extracted from the pdf:

This is the link to the PDF.

Could anyone solve this problem using Python with proper comments to help me understand?

import pdf2image
from PIL import Image
import pytesseract

image = pdf2image.convert_from_path('/content/SRW1012022Y0002378_220216102321.PDF')
for pagenumber, page in enumerate(image):
    detected_text = pytesseract.image_to_string(page)
    print(detected_text)

I tried the above code snippet, and I can extract all the text from pdf, but I can't grab specific text to continue applying logic to it.

There are 0 best solutions below