Embedding Process Stuck While Handling Large CSV

66 Views Asked by tbo812 At 30 August 2023 at 10:34

I wanted to use the script below for embedding. It worked fine on a small amount of data, but when I loaded a CSV with 300,000 records into it, the embedding has been running for 40 minutes and doesn't stop.

The script:

load_dotenv('.env')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
openai.api_key = OPENAI_API_KEY

model = OpenAIEmbeddings()

dataset = pd.read_csv('keywords.csv', encoding='ISO-8859-1')


dataset['embedding'] = dataset['keyword'].apply(
    lambda x: get_embedding(x, engine='text-embedding-ada-002')
)

dataset['embedding'].apply(np.array)
     
keyword = input('Input:')

keywordVector = get_embedding(
    keyword, engine="text-embedding-ada-002"
)


print(keywordVector)

How can I optimize this?

Instead of calling the API for each keyword separately, you can try batching multiple keywords together if the API supports it. Unfortunately, with the OpenAI API, it appears that the openai.Embed.create function only accepts one prompt at a time, so this might not be possible in this case.

Original Q&A

Embedding Process Stuck While Handling Large CSV

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in NLP

Related Questions in OPENAI-API

Related Questions in WORD-EMBEDDING

Related Questions in PYTHON-EMBEDDING

Trending Questions

Popular # Hahtags

Popular Questions