I want to use pdfBox to extract test from Persian pdf files, but it returns "?" for all the Persian characters (it returns correctly the Latin words in the same document).
How can I fix it? Any advice?
I want to use pdfBox to extract test from Persian pdf files, but it returns "?" for all the Persian characters (it returns correctly the Latin words in the same document).
How can I fix it? Any advice?
Copyright © 2021 Jogjafile Inc.
Sadly, the provided file has the persian text as vector graphics, not as text from fonts, so it cannot be extracted. You'll have to use OCR for it.
See also the text extraction FAQ: