Tesseract.js doesn't recognize Arabic language

702 Views Asked by At

I'm using tesseract.js ORC library to read what is written on an image and write it in console or on a text file so I found this library and it's working find with English word or characters but when I tried to read what is written on the image in Arabic language it doesn't work so this is the image that I'm trying to read

enter image description here

and this is my code :-

Head Tag:-

<script src='https://unpkg.com/[email protected]/dist/tesseract.min.js'</script>

Body Tag:-

    <script>
        Tesseract.recognize(
        'image.png',
        'ara',
  { logger: m => console.log(m) }
).then(({ data: { text } }) => {

})
</script>
1

There are 1 best solutions below

0
On

You need to handle the result inside the .then() block.

Here's a working example :

<!DOCTYPE html>
<meta charset="utf-8">
<title> OCR TEST </title>
<script src="https://unpkg.com/[email protected]/dist/tesseract.min.js"></script>
<output id="result">Processing...</output>
<script>
  const output = document.querySelector('output#result');
  Tesseract.recognize(
    'https://i.imgur.com/mdzmK4w.png', 'ara'
  ).then(result => {
    output.value = result.data.text;
  }).catch(err => {
    output.value = 'Processing Failed';
    console.log(err);
  });
</script>

It can take some time to process, especially on the first load because the WASM module and OCR training data will be fetched in the background first. You can see this happening in the Network tab of your Dev Tools.