How to create an embeddings model in langchain

1.4k Views Asked by At

I want to pass the hidden_states of llama-2 as an embeddings model to my method FAISS.from_document(<filepath>, <embedding_model>). Currently, I have the llama-2 model and get embeddings for a string.

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    output_hidden_states=True,
    use_auth_token=auth_token,
)


# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

# Input data to test the code
input_text = "Hello World!"


encoded_input = tokenizer(input_text, return_tensors='pt')
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf",
                                            trust_remote_code=True,
                                            config=model_config,
                                            quantization_config=bnb_config,
                                            device_map='auto',
                                            use_auth_token=auth_token
                                            )


outputs = model(**encoded_input)
hidden_states = outputs.hidden_states


print(len(hidden_states))  # 33 for Llama-2: 1 (embeddings) + 32 (layers)
print(hidden_states[0].shape)  # Shape of the embeddings
print(hidden_states[2])

Print outputs:

33
torch.Size([1, 4, 4096])
tensor([[[ 0.0373, -0.5762, -0.0180,  ...,  0.0962, -0.1099,  0.3767],
         [ 0.0676,  0.0400, -0.0033,  ...,  0.0655,  0.0278, -0.0079],
         [-0.0160,  0.0157,  0.0478,  ..., -0.0224, -0.0341,  0.0093],
         [ 0.0229, -0.0104,  0.0217,  ..., -0.0080, -0.0012, -0.0342]]],
       dtype=torch.float16, grad_fn=<ToCopyBackward0>)

Now, I want to build the embeddings of my documents with Llama-2:

from langchain.vectorstores import FAISS

# <clean> is the file-path
FAISS.from_documents(clean, model)
AttributeError: 'LlamaForCausalLM' object has no attribute 'embed_documents'

How can I solve it and how can I use Llama-2-Hidden-States for embedding?

0

There are 0 best solutions below