How do I generate embeddings for dicts (not text) for Vertex AI Search?

78 Views Asked by At

I am trying to generate and store vector embeddings in my GCS bucket such that they can be accessed by Vector AI Search to find the most similar items.

Following this official tutorial, they mention that the first step is to generate an embedding, and that this can be done by generating text embeddings.

If we look at the referenced code, with Python one would do the following:

from vertexai.language_models import TextEmbeddingModel


def text_embedding() -> list:
    """Text embedding with a Large Language Model."""
    model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")
    embeddings = model.get_embeddings(["What is life?"])
    for embedding in embeddings:
        vector = embedding.values
        print(f"Length of Embedding Vector: {len(vector)}")
    return vector

Now there is another tutorial where they basically generate and store vector embeddings, also using textembedding-gecko model from Spanner to Vector Search. Obviously, data stored in Spanner is not stored as a text and has a row/column or key/value dict structure.

With the code above, which is pointing on generating text embeddings, this format is not supported. How do I therefore go from a dict to embedding?

Other resources I looked at:

Other comments:

  • In case of product images as the initial tutorial shows, one would indeed expect the object to have multiple attributes, not just texts.
  • In the future, I would also like to explore overweighting and underweighting some attributes if possible.
1

There are 1 best solutions below

2
Gang Chen On

Vertex AI text-embeddings API (i.e. textembedding-gecko@003) takes snippet of text as input and generate array of floating point numbers (embeddings). The REST API supports the following input request(SDKs are the same):

{
  "instances": [
    { "content": "TEXT"}
  ],
  "parameters": { 
    "autoTruncate": AUTO_TRUNCATE 
  }
}

So, it consumes an array of raw text. If the source of the texts are coming from a database table (as the stackoverflow example you are referring), it should create embedding instances per column (one embedding per selected row, think of a specific feature of a data instance). It should NOT take an entire row as embedding input since the embedding API will not understand the relationship of the columns. This applies to the Spanner ML.predict since it uses the same Vertex AI embedding APIs under the cover.

To your second question, I guess you refer multiple attributes as input with both text and images? In that case, it uses the multimodal embeddings API can take input object with text, image and video. They are for the purpose of multimodal instead of multiple attributes. As shown in the multimodal embedding request:

{
  "instances": [
    {
      "text": "TEXT",
      "image": {
        "bytesBase64Encoded": "B64_ENCODED_IMG"
      }
    }
  ]
}

This will send an embedding request with image and text data to POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/multimodalembedding@001:predict multimodal embedding API.