I'm trying to reconstruct a table from an image which has hindi+english text using pytessaract

18 Views Asked by At

This is the imageimage from which I am trying to extract data which is in hindi+english, using pytessaract and then I wish to reconstruct it back to an excel file with the same formatting

Firstly, the problem is that I can pass only one language(hindi in this case) to the pytessaract model so I get the result like this: image My code looks like this:

file = "./1.jpg"
text = []
# use pytesseract to read the text from the image 
img = cv2.imread(file)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
text.append(pytesseract.image_to_string(img, lang="hin"))
# print(text)
# export
with open('temp.txt', 'w', encoding='utf-8') as file:
    file.writelines(text)

I have two questions:

  1. How do I read both english and hindi texts from this image i.e, how do I pass english and hindi as parameters to the image_to_string() method
  2. Once I perform this ocr, how do I reconstruct the table and export that dataframe to excel file?
0

There are 0 best solutions below