Can someone please advise me upon the hardware requirements of using sentence-transformers/all-MiniLM-L6-v2 for a semantic similarity use-case. I had downloaded the model locally and am using it to generate embedding, and finally using util.pytorch_cos_sim to calculate similarity scores between 2 sentences. All was working good in my Mac Pro ( 2.4 GHz 8-Core Intel Core i9 processor and 32 GB memory); but after I moved the model to containers of 1 core CPU and 4 GB RAM (within my company provided network), the code is taking at least 15-20 times more time to generate the cosine similarity score.
Did someone face a similar situation? Kindly advise. Thank you in advance for the help!
N.B.: I am also sharing the sample code for reference.
from sentence_transformers import SentenceTransformer, util
sentences = ["What happens when my account is debited", "What is a debit"]
# Model Instantiation
sent_sim_model = SentenceTransformer('./all-MiniLM-L6-v2')
embedding_0= sent_sim_model.encode(sentences[0], convert_to_tensor=True)
embedding_1 = sent_sim_model.encode(sentences[1], convert_to_tensor=True)
# Calculate cosine sim score:
print(util.pytorch_cos_sim(embedding_0, embedding_1).tolist()[0][0])
I have been running the model successfully in my local system for quite sometime now (after storing it locally in the same directory as that of the code), but once I had moved the model and the above code to a docker container , the response time (which used to be between 2-3 secs in my local system) had gone up to more than 1 minute. Since each container I am using has got a configuration of 1 CPU core and 4 GB RAM, I would like to get inputs on the fact if this low hardware can be the issue for the above code .
I can't add a comment so giving a full reply
I built a tiny docker rest API with Flask and deployed it to https://fly.io/ with under
2gb, and I get pretty good resultsbuilt with nixpack
run locally
deploy to Fly or Railway and test using postman, it takes a few seconds to show the results