txtai embeddings misses sentences idea

48 Views Asked by AudioBubble At 25 March 2024 at 19:17

Let us suppose that we have the following small dataset, based on which we should calculate texts embeddings and check if our model can accurately match sentence with similar idea, data is following:

data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, " +
  "forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends " +
  "in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]

I would like to estimate embeddings using sentence-transformers/nli-mpnet-base-v2 model, but for some text it accurately guesses the correct text, while for some others it fails. For instance if I search query for text:

query ="climate  change in world"

Then it returns the following result:

Climate  change in world -> Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg.

Not exact statement, but logically ice shelf collapsed, because of global warming right? If I search for:

 Temperature is increasing around world -> Beijing mobilises invasion craft along coast as Taiwan tensions escalate

Nonsense right? How can I improve result? Here is code given:

from txtai.embeddings import Embeddings
    embeddings =Embeddings(path='sentence-transformers/nli-mpnet-base-v2')
    data = [
      "US tops 5 million confirmed virus cases",
      "Canada's last fully intact ice shelf has suddenly collapsed, " +
      "forming a Manhattan-sized iceberg",
      "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
      "The National Park Service warns against sacrificing slower friends " +
      "in a bear attack",
      "Maine man wins $1M from $25 lottery ticket",
      "Make huge profits without work, earn up to $100,000 a day"
    ]
    embeddings.index(data)
    query ="temperature is increasing around world"
    # for query in ("feel good story", "climate change", "public health story", "war",
    #               "wildlife", "asia", "lucky", "dishonest junk"):
    uid =embeddings.search(query,1)[0][0]
    print(f'{query:20} -> {data[uid]}')

Original Q&A

txtai embeddings misses sentences idea

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in ARTIFICIAL-INTELLIGENCE

Related Questions in OPENAIEMBEDDINGS

Trending Questions

Popular # Hahtags

Popular Questions