How can i get the .tiff data from pdfium?

148 Views Asked by At

I have use FPDFImageObj_GetImageDataDecoded, FPDFImageObj_GetImageDataRaw, but all failed

I just use the pdfium to get the data of pdf pages, but i can't get the tiff or jbig2 right data from FPDFImageObj_GetImageDataDecoded api. who can help me? thank you very much

1

There are 1 best solutions below

0
mara004 On

There is a feature request discussing this: https://crbug.com/pdfium/1930 (disclaimer: I'm the reporter)

TLDR The functions you mention do provide the main data stream, but for some filters complementary data would be needed to actually re-construct the image, which pdfium does not provide.

  • For CCITTDecode, as the TIFF format can use, pdfium's public API does not tell the CCITT group, but this would be needed to re-construct the TIFF header, which the PDF format strips. And I think BlackIs1 info would also be needed; possibly more.
  • JBIG2Decode may optionally use a separate JBIG2Globals stream, which again pdfium does not provide. I had filed a separate bug about this: https://crbug.com/pdfium/1927. However, I guess the raw JBIG2 data might not be very useful except for re-insertion into a PDF. IIRC the way pikepdf handles JBIG2 extraction to files is to just decode the data and re-encode to some other format. From a programmatic POV that's not ideal, but I guess the context is that standalone JBIG2 isn't really supported by end-user apps.

Concerning FPDFImageObj_GetImageDataDecoded(), note that it does not fully decode images; it only applies "simple" filters (see https://crbug.com/pdfium/1203#c7), so the function name is a bit misleading.

For the plain pixel data, you can use FPDFImageObj_GetBitmap(), FPDFBitmap_GetBuffer() & co, but note that FPDF_BITMAP is limited in supported pixel formats and bit depth (e.g. no CMYK, B/W, >8bpc RGB(A), ...).