Semantic similarity function in R

108 Views Asked by At

I have the following function for code to measure the semantic similarity of the abstracts of two papers:

cosine_similarity <- function(abstract1, abstract2) {
  
  # Create a document-term matrix from the two abstracts
  docs <- c(abstract1, abstract2)
  
  # Create a Corpus object from the abstract text
  corpus <- Corpus(VectorSource(docs))
  
  
  tdm <- TermDocumentMatrix(corpus)
  
  # Apply Latent Semantic Analysis (LSA) to the matrix
  lsa <- lsa(tdm)
  
  # Extract the LSA weights for each term
  weights <- as.matrix(lsa$tk)
  
  # Calculate cosine similarity between the two abstracts
  cosine_sim <- cosine(weights[, 1], weights[, 2])
  
  return(cosine_sim)
}

But when I enter two identical abstracts as inputs of the function, the output is not equal to 1. I assume it should be equal to 1, when we compare two identical pieces of text. Do you know what the problem is in the code? How should I fix the problem?

0

There are 0 best solutions below