Extract PDF Content Including Images For RAG

152 Views Asked by At

I am trying to build a PDF content extraction and chunking system for RAG in my application. I need to include images from pdf as urls,so that the llm can use that images in the response most of the solutions that i have seen only extract text content from pdf.Is there any way to extract images and text from pdf ?

1

There are 1 best solutions below

0
Nick Magnanini - preprocess.co On

PyMuPDF allows you to do that for images and tables