These are the items which are needed to be extracted from the pdf:
This is the link to the PDF.
Could anyone solve this problem using Python with proper comments to help me understand?
import pdf2image
from PIL import Image
import pytesseract
image = pdf2image.convert_from_path('/content/SRW1012022Y0002378_220216102321.PDF')
for pagenumber, page in enumerate(image):
detected_text = pytesseract.image_to_string(page)
print(detected_text)
I tried the above code snippet, and I can extract all the text from pdf, but I can't grab specific text to continue applying logic to it.
