I am using the Google Generative AI new api. I have a corpus of document and I want to find some outliers. When creating the embeddings what is the most appropriate task_type to use? My code resemble the following:
model = TextEmbeddingModel.from_pretrained("textembedding-gecko-multilingual@001")
text_input = TextEmbeddingInput(text=text, task_type='CLUSTERING')
embeddings = model.get_embeddings([text_input])
The available tasks are :
- RETRIEVAL_QUERY: Specifies the given text is a query in a search or retrieval setting.
- RETRIEVAL_DOCUMENT: Specifies the given text is a document in a search or retrieval setting.
- SEMANTIC_SIMILARITY: Specifies the given text will be used for Semantic Textual Similarity (STS).
- CLASSIFICATION: Specifies that the embeddings will be used for classification.
- CLUSTERING: Specifies that the embeddings will be used for clustering.
Not sure what is the most appropriate for an anomaly detection use case.