Hebrew PDF parsing return symbols instead of the correct text

117 Views Asked by Yoni Kohn At 08 August 2023 at 19:32

PDF encoding is unknown, and I'm getting symbols instead of the correct text after decoding.

The PDF text is in Hebrew.

I didn't wrote the PDF just got it with password so I used the option of printing to "save as PDF" to get it without password before parsing.

I'm trying to parse PDF to JSON (now I'm using node and the package pdf2json, but I can use python or any other language that can help me more, but preferred node or python).

The PDF is in hebrew and I don't know the encoding, when I tried to decode the encoded text I got symbols like 팀턀픀ꀀ✀ .

How can I resolve it?

To be clearer - this is an example of encoded text:

'%ED%8C%80%ED%84%80%ED%94%80%EE%88%80%EA%80%80%E2%9C%80%EE%84%80%00'

Decoded text:

팀턀픀ꀀ✀

Thanks!

Original Q&A

Hebrew PDF parsing return symbols instead of the correct text

There are 0 best solutions below

Related Questions in NODE.JS

Related Questions in JSON

Related Questions in PARSING

Related Questions in PDF

Related Questions in HEBREW

Trending Questions

Popular # Hahtags

Popular Questions