I am trying to use pdf2image to change pdf files into images. I already set the pdf path, which for example like this, 'gs://path/to/my/pdf/file', where the pdf files are stored in a bucket's subdirectory in GCP's storage. But when I want to use the function convert_from_path, it gives error like this.I'm wondering how to change pdf to image files in the GCP environment, especially on Colab Enterprise Vertex AI GCP.
ValueError:
During handling of the above exception, another exception occurred:
PDFPageCountError Traceback (most recent call last)
3 frames
<ipython-input-25-47fc37d40a49> in <cell line: 1>()
----> 1 convert_pdf_to_images(credentials)
<ipython-input-23-e796b0105771> in convert_pdf_to_images(credentials)
22
23 # Use pdf2image to convert PDF to images and save them
---> 24 images = pdf2image.convert_from_path(pdf_path)
25
26 for i, image in enumerate(images):
/usr/local/lib/python3.10/dist-packages/pdf2image/pdf2image.py in convert_from_path(pdf_path, dpi, output_folder, first_page, last_page, fmt, jpegopt, thread_count, userpw, ownerpw, use_cropbox, strict, transparent, single_file, output_file, poppler_path, grayscale, size, paths_only, use_pdftocairo, timeout, hide_annotations)
125 poppler_path = poppler_path.as_posix()
126
--> 127 page_count = pdfinfo_from_path(
128 pdf_path, userpw, ownerpw, poppler_path=poppler_path
129 )["Pages"]
/usr/local/lib/python3.10/dist-packages/pdf2image/pdf2image.py in pdfinfo_from_path(pdf_path, userpw, ownerpw, poppler_path, rawdates, timeout, first_page, last_page)
609 )
610 except ValueError:
--> 611 raise PDFPageCountError(
612 f"Unable to get page count.\n{err.decode('utf8', 'ignore')}"
613 )
PDFPageCountError: Unable to get page count.
Internal Error: Cannot handle URI 'gs://path/to/my/pdf/file'
I tried to change the pdf path to look like the normal one like in the colab, usually the file path in colab would look like '/content/path/to/file/pdf' but the error would be like it can't open that file with that path