Can I convert a streamlit uploaded pdf file into a langchain document?

252 Views Asked by Tushar Singh At 19 January 2024 at 19:12

I am building a streamlit app where a user can upload a pdf file and it generates questions based on the file. The error is the uploaded file is of uploaded_file object and the langchain, pdf loader accepts the file path as an input stored locally to further load the pdf and split it into small chunks of documents using text splitter. Is there a way I can convert it into a langchain doc before processing or any other method suitable to do it?

Original Q&A

There are 1 best solutions below

InsertCheesyLine On 22 January 2024 at 11:36

Let us say you a streamlit app with st.file_uploader

import streamlit as st

uploaded_file = st.file_uploader("Upload file")

Once a file is uploaded uploaded_file contains the file data. You cannot directly pass this to PyPDFLoader as it is a BytesIO object.

We need to save this file locally

with open(uploaded_file.name, mode='wb') as w:
        w.write(uploaded_file.getvalue())

and then, pass its file path to the loader

from langchain_community.document_loaders import PyPDFLoader

if uploaded_file: # check if path is not None
    loader = PyPDFLoader(uploaded_file.name)
    pages = loader.load_and_split()
    print(pages[0])

pages should now be a list of langchain Documents

Can I convert a streamlit uploaded pdf file into a langchain document?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PDF

Related Questions in STREAMLIT

Related Questions in LANGCHAIN

Related Questions in DOCUMENT-LOADER

Trending Questions

Popular # Hahtags

Popular Questions