Hebrew PDF parsing return symbols instead of the correct text

117 Views Asked by At

PDF encoding is unknown, and I'm getting symbols instead of the correct text after decoding.

The PDF text is in Hebrew.

I didn't wrote the PDF just got it with password so I used the option of printing to "save as PDF" to get it without password before parsing.

I'm trying to parse PDF to JSON (now I'm using node and the package pdf2json, but I can use python or any other language that can help me more, but preferred node or python).

The PDF is in hebrew and I don't know the encoding, when I tried to decode the encoded text I got symbols like 팀턀픀ꀀ✀ .

How can I resolve it?

To be clearer - this is an example of encoded text:

'%ED%8C%80%ED%84%80%ED%94%80%EE%88%80%EA%80%80%E2%9C%80%EE%84%80%00'

Decoded text:

팀턀픀ꀀ✀ 

Thanks!

0

There are 0 best solutions below