How to run multiprocess Chroma.from_documents() in Langchain

2.1k Views Asked by At

Can we somehow pass an option to run multiple threads/processes when we call Chroma.from_documents() in Langchain?

I am trying to embed 980 documents (embedding model is mpnet on CUDA), and it take forever. Specs: Software: Ubuntu 20.4 (on Win11 WSL2 host), Langchain version: 0.0.253, pyTorch version: 2.0.1+cu118, Chroma Version: 0.4.2, CUDA 11.8 Processor: Intel i9-13900k at 5.4Ghz all 8 P-cores and 4.3Ghz all remaining 16 E-cores. GPU: RTX 4090 GPU

2

There are 2 best solutions below

0
Paris Char On BEST ANSWER

somehow it got resolved I think. I had to uninstall chroma/chroma-client and reinstall full chroma package. Also it helps incredibly if your embeddings are device=cuda

1
user3517818 On

Chroma now supports multiple threads, so this should be technically possible. Why not simply import threads and spawn multiple loaders?