How to handle English characters in source data when translating from Hindi to English using python?

19 Views Asked by At

I am working on a task where I need to translate data in hindi from a pdf to english. The data also contains english characters as well as some data is entirely in English.

Currently, I am turning the PDF into an image and running OCR onto it. Then I am translating it to English. However, the english characters in the data are converted to numbers. Also, the numbers are not extracted correctly. I am using pytesseract for OCR and Google translator through deep_translator for translation. Any suggestions?

0

There are 0 best solutions below