I am retrieving pages from a pdf using convert_from_path (pdf2image) This is the error i am facing:
<ipython-input-45-4ebf020b9136> in <cell line: 1>()
1 for pdf in list_of_pdfs:
----> 2 images = convert_from_path(pdf,first_page= 1,last_page=2)
2 frames
/usr/local/lib/python3.10/dist-packages/pdf2image/pdf2image.py in convert_from_path(pdf_path, dpi, output_folder, first_page, last_page, fmt, jpegopt, thread_count, userpw, ownerpw, use_cropbox, strict, transparent, single_file, output_file, poppler_path, grayscale, size, paths_only, use_pdftocairo, timeout, hide_annotations)
266 )
267 else:
--> 268 images += parse_buffer_func(data)
269 finally:
270 if auto_temp_dir:
/usr/local/lib/python3.10/dist-packages/pdf2image/parsers.py in parse_buffer_to_ppm(data)
26 size_x, size_y = tuple(size.split(b" "))
27 file_size = len(code) + len(size) + len(rgb) + 3 + int(size_x) * int(size_y) * 3
---> 28 images.append(Image.open(BytesIO(data[index : index + file_size])))
29 index += file_size
30
/usr/local/lib/python3.10/dist-packages/PIL/Image.py in open(fp, mode, formats)
3281 raise
3282 return None
-> 3283
3284 im = _open_core(fp, filename, prefix, formats)
3285
UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7820221cd290>
Here is the code I am using :
import io
from io import BytesIO
from PIL import Image
from pdf2image import convert_from_path
pdf_list = ['path_to_pdf.pdf','path_to_pdf2.pdf']
for pdf in pdf_list:
images = convert_from_path(pdf,first_page= 1,last_page=2)
This code was working perfectly fine a few days back. I am not sure what broke now. I can't figure out why it fails for me.
You can figure it our through exception handling like this.