I would like to compress scanned text (monochrome or few colours) and store it in pdf (maybe djvu) files. I remember that I got very good results with Windows/Acrobat and "ZRLE" compressed monochrome tiff embedded into pdf. The algorithm was loossless as far as I remember. Now I search a way to obtain good results on linux. It should be storage saving and avoid loss (I do not mind loosing colours, but I do not want e.g. jpeg compression which would create noisy results for text scans). I need it for batch conversion, so I was thinking of the ImageMagick convert command. But which output format should I use so I get good results and to be able to embed it into pdf files (for example using pdflatex)? Or is it generally better to use djvu files?
efficient image compression for pdf embedding with linux
2.6k Views Asked by highsciguy At
2
There are 2 best solutions below
1

DJVU is not a bad choice, but if you want to stay in PDF for better compatibility you may want to look into lossless JBIG2 compression.
Quote from Wikipedia:
Overall, the algorithm used by JBIG2 to compress text is very similar to the JB2 compression scheme used in the DjVu file format for coding binary images.
jbig2enc encoder for images using jbig2 compression, was originally written for GoogleBooks by Adam Langley
https://github.com/agl/jbig2enc
I forked to include latest improvements By Rubypdf and others
https://github.com/DingoDog/jbig2enc
I also built several binaries of jbig2enc for puppy linux (it can be working also on other distributions)
http://dokupuppylinux.info/programs:encoders