I am working on a task where I need to translate data in hindi from a pdf to english. The data also contains english characters as well as some data is entirely in English.
Currently, I am turning the PDF into an image and running OCR onto it. Then I am translating it to English. However, the english characters in the data are converted to numbers. Also, the numbers are not extracted correctly. I am using pytesseract for OCR and Google translator through deep_translator for translation. Any suggestions?