I'm trying to reconstruct a table from an image which has hindi+english text using pytessaract

18 Views Asked by NISHITH At 07 March 2024 at 06:55

This is the image from which I am trying to extract data which is in hindi+english, using pytessaract and then I wish to reconstruct it back to an excel file with the same formatting

Firstly, the problem is that I can pass only one language(hindi in this case) to the pytessaract model so I get the result like this: My code looks like this:

file = "./1.jpg"
text = []
# use pytesseract to read the text from the image 
img = cv2.imread(file)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
text.append(pytesseract.image_to_string(img, lang="hin"))
# print(text)
# export
with open('temp.txt', 'w', encoding='utf-8') as file:
    file.writelines(text)

I have two questions:

How do I read both english and hindi texts from this image i.e, how do I pass english and hindi as parameters to the image_to_string() method
Once I perform this ocr, how do I reconstruct the table and export that dataframe to excel file?

Original Q&A

I'm trying to reconstruct a table from an image which has hindi+english text using pytessaract

There are 0 best solutions below

Related Questions in OCR

Related Questions in PYTHON-TESSERACT

Trending Questions

Popular # Hahtags

Popular Questions