I am trying to load LLM from the local disk of my laptop which is not working. when i try to load with the following approach its working as expected and i am getting response to my query.
def load_llm():
# Load the locally downloaded model here
llm = CTransformers(
model = "TheBloke/Llama-2-7B-Chat-GGML",
model_type="llama",
config={'max_new_tokens': 3000,
'temperature': 0.01,
'context_length': 3000}
)
return llm
If i change the above method as below. I am not getting any response.
def load_llm():
# Local CTransformers model
MODEL_BIN_PATH = 'models/llama-2-7b-chat.ggmlv3.q8_0.bin'
MODEL_TYPE = 'llama'
MAX_NEW_TOKENS = 3000
TEMPERATURE = 0.01
context_length = 3000
llm = CTransformers(model=MODEL_BIN_PATH,
model_type=MODEL_TYPE,
config={'max_new_tokens': MAX_NEW_TOKENS,
'temperature': TEMPERATURE,
'context_length': context_length}
)
return llm
I wanted to make sure I loaded the model from a local disk instead of communicating with the Internet.
Below are my import statments.
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain import PromptTemplate
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import CTransformers
from langchain.chains import RetrievalQA
import chainlit as cl
Appreciated your leads .....
I don't usually use CTransformers but I know the latest format is GGUF and GGML was discontinued, so if you are in the lastest versions of CTransformers is likely it will not run GGML anymore.
Don't know if it helps you but if you don't mind changing your approach I always used llama-cpp-python bindings and it always worked for me to run models locally.
To do this you download the GGUF version of the model you want for TheBloke.
Then run pip install llama-cpp-python (is possible the will ask for pytorch to be already installed). After is installed you can run any GGUF model using:
Full documentation on: https://github.com/abetlen/llama-cpp-python
It also have integration with Langchain.
But just changing to a GGUF model can possibly solve your problem with CTransformers.