How to use pdf2image on GCP? I have a problem, "Cannot handle the URI"

61 Views Asked by At

I am trying to use pdf2image to change pdf files into images. I already set the pdf path, which for example like this, 'gs://path/to/my/pdf/file', where the pdf files are stored in a bucket's subdirectory in GCP's storage. But when I want to use the function convert_from_path, it gives error like this.I'm wondering how to change pdf to image files in the GCP environment, especially on Colab Enterprise Vertex AI GCP.

ValueError: 

During handling of the above exception, another exception occurred:

PDFPageCountError                         Traceback (most recent call last)
3 frames
<ipython-input-25-47fc37d40a49> in <cell line: 1>()
----> 1 convert_pdf_to_images(credentials)

<ipython-input-23-e796b0105771> in convert_pdf_to_images(credentials)
     22 
     23     # Use pdf2image to convert PDF to images and save them
---> 24     images = pdf2image.convert_from_path(pdf_path)
     25 
     26     for i, image in enumerate(images):

/usr/local/lib/python3.10/dist-packages/pdf2image/pdf2image.py in convert_from_path(pdf_path, dpi, output_folder, first_page, last_page, fmt, jpegopt, thread_count, userpw, ownerpw, use_cropbox, strict, transparent, single_file, output_file, poppler_path, grayscale, size, paths_only, use_pdftocairo, timeout, hide_annotations)
125         poppler_path = poppler_path.as_posix()
    126 
--> 127     page_count = pdfinfo_from_path(
    128         pdf_path, userpw, ownerpw, poppler_path=poppler_path
    129     )["Pages"]

/usr/local/lib/python3.10/dist-packages/pdf2image/pdf2image.py in pdfinfo_from_path(pdf_path, userpw, ownerpw, poppler_path, rawdates, timeout, first_page, last_page)
    609         )
    610     except ValueError:
--> 611         raise PDFPageCountError(
    612             f"Unable to get page count.\n{err.decode('utf8', 'ignore')}"
    613         )

PDFPageCountError: Unable to get page count.
Internal Error: Cannot handle URI 'gs://path/to/my/pdf/file'

I tried to change the pdf path to look like the normal one like in the colab, usually the file path in colab would look like '/content/path/to/file/pdf' but the error would be like it can't open that file with that path

0

There are 0 best solutions below