I'm trying to calculate over random 1000 quest and 1000 answer using cosine similarity with bert-base-uncased, and after I want to find most similar 5 asnwer, after calculate top1 and top5 real answer accuracy. But im receiving output always 0.0 accuracy and answers not similar.
sample_1000_quest = train_ds['questions'].sample(1000)
sample_1000_answer = train_ds['answers'].sample(1000)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer_bert = BertTokenizer.from_pretrained('bert-base-uncased')
model_bert = BertModel.from_pretrained('bert-base-uncased', output_hidden_states=True).eval()
selected_question = sample_1000_quest.iloc[1]
selected_question_idx = sample_1000_quest.index.get_loc(30574)
encoded_question = tokenizer_bert(selected_question, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs = model_bert(**encoded_question)
question_embedding = outputs.last_hidden_state.mean(dim=1)
encoded_answers = []
answer_embeddings = []
for answer in sample_1000_answer:
encoded_answer = tokenizer_bert(answer, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs = model_bert(**encoded_answer.to(device))
answer_embedding = outputs.last_hidden_state.mean(dim=1)
answer_embeddings.append(answer_embedding)
similarities = []
for answer_embedding in answer_embeddings:
similarity = cosine_similarity(question_embedding, answer_embedding)
similarities.append(similarity.item())
most_similar_indices = np.argsort(similarities)[-5:][::-1]
ground_truth_idx = train_ds['answers'].iloc[selected_question_idx]
top1_accuracies = []
top5_accuracies = []
top1_idx = most_similar_indices[0]
top1_accuracy = 1 if top1_idx == ground_truth_idx else 0
top5_accuracy = 1 if ground_truth_idx in most_similar_indices else 0
top1_accuracies.append(top1_accuracy)
top5_accuracies.append(top5_accuracy)
print("Selected Question:", selected_question)
print("Most similar 5 asnwer:")
for i, idx in enumerate(most_similar_indices):
print(f"{i+1}. {sample_1000_answer.iloc[idx]}")
print("Top-1 Accuracy:", top1_accuracy)
print("Top-5 Accuracy:", top5_accuracy)
Output:
Selected Question: bir sunum oluşturmak için beş adım yazın.
Most similar 5 asnwer:
1. doğum günü gülüm bütün yaz aldığım en güzel hediyeydi.
2. bu deneyin amacı ilkeleri anlamaktır.
3. bir satış elemanı sunum yapıyor.
4. hangi konuda yardıma ihtiyacın olduğunu söyle.
5. konuşmanın içeriği, projede bir sonraki adım için onay almakla ilgilidir.
Top-1 Accuracy: 0
Top-5 Accuracy: 0
bert-base-uncasedis a model that was mainly pre-trained on English. You could try a model pre-trained for Turkish instead, such asdbmdz/bert-base-turkish-cased.Also seeing that you use a relatively large data set, but calculate accuracy to only give any value to the correct 5 indices seems seems like too much of a harsh cut-off. It would be better if you found a way to rate how far from the predicted answer is from the expected one instead of just rating it as either 0 or 1, as this will just mean that almost all answers get a score of 0. Alternatively, you could add an even more lenient measure such as Top-10 Accuracy or Top-25 Accuracy to see if this gives higher values.