Choose encoding for pdftohtml

456 Views Asked by At

How can I force pdftohtml output to be UTF 8?

$ pdftohtml -enc utf8 my.pdf 
Error: Couldn't find unicodeMap file for the 'utf-8' encoding

And -listenc doesn't seem to be a valid option.

I think it is using ISO-8859-1 by default (although for some reason VIM reads the file and special characters fine even though :set enc? reports utf-8)

1

There are 1 best solutions below

0
Reynaldo Aceves On BEST ANSWER

Please run the command by using pdftohtml -enc UTF-8 file.pdf Like:

$ pdftohtml -enc UTF-8 my.pdf