I have the following function for code to measure the semantic similarity of the abstracts of two papers:
cosine_similarity <- function(abstract1, abstract2) {
# Create a document-term matrix from the two abstracts
docs <- c(abstract1, abstract2)
# Create a Corpus object from the abstract text
corpus <- Corpus(VectorSource(docs))
tdm <- TermDocumentMatrix(corpus)
# Apply Latent Semantic Analysis (LSA) to the matrix
lsa <- lsa(tdm)
# Extract the LSA weights for each term
weights <- as.matrix(lsa$tk)
# Calculate cosine similarity between the two abstracts
cosine_sim <- cosine(weights[, 1], weights[, 2])
return(cosine_sim)
}
But when I enter two identical abstracts as inputs of the function, the output is not equal to 1. I assume it should be equal to 1, when we compare two identical pieces of text. Do you know what the problem is in the code? How should I fix the problem?