Config for pytesseract (Urdu language)

559 Views Asked by Samee Arif At 08 August 2021 at 21:03

I am having some problems with pytesseract. With this line of code pytesseract works poorly with Urdu language:

text = pytesseract.image_to_string(img, lang="urd")

What configuration should I use to improve the accuracy for Urdu language? And what kind of pre-processing can I do on the image?

I am using this kind of image: TestFile

For the image attached the output should be:

بعد نجی ٹی وی سے گفتگو کرتے ہوئے وزیر خارجہ شاہ محمود قریشی نے بتایا کہ ملاقات

But the output I am getting is:

٦ری‏ وی سے کلوکرتے ہونے وز خارمہ اہ مود رٹ نے نال لات

Images are in these fonts: Pak Nastaleeq, Alvi Nastaleeq, Jameel Noori Nastaleeq, Nafees Nastaleeq.

There are 0 best solutions below