I'm just getting started with working with LLMs, particularly OpenAIs and other OSS models. There are a lot of guides on using LlamaIndex to create a store of all your documents and then query on them. I tried it out with a few sample documents, but discovered that each query gets super expensive quickly. I think I used a 50-page PDF document, and a summarization query cost me around 1.5USD per query. I see there's a lot of tokens being sent across, so I'm assuming it's sending the entire document for every query. Given that someone might want to use thousands of millions of records, I can't see how something like LlamaIndex can really be that useful in a cost-effective manner.
On the other hand, I see OpenAI allows you to train a ChatGPT model. Wouldn't that, or using other custom trained LLMs, be much cheaper and more effective to query over your own data? Why would I ever want to set up LlamaIndex?
TLD;DR: Use LlamaIndex or LangChain to get an exact answer (i.e., a fact) to a specific question from existing data sources.
Why choose LlamaIndex or LangChain over fine-tuning a model?
The answer is simple, but you couldn't answer it yourself because you were only looking at the costs. There are other aspects as well, not just costs. Take a look at the usability side of the question.
Fine-tuning a model will give the model additional general knowledge, but the fine-tuned model will not (necessarily) give you an exact answer (i.e., a fact) to a specific question.
People train an OpenAI model with some data, but when they ask it something related to the fine-tuning data, they are surprised that the model doesn't answer with the knowledge gained by fine-tuning. See an example explanation on the official OpenAI forum by @juan_olano:
Also, see the official OpenAI documentation:
LlamaIndex or LangChain enable you to connect OpenAI models with your existing data sources. For example, a company has a bunch of internal documents with various instructions, guidelines, rules, etc. LlamaIndex or LangChain can be used to query all those documents and give an exact answer to an employee who needs an answer.
OpenAI models can't query their knowledge. Querying requires calculating embedding vectors and cosine similarity, which OpenAI models can't do. The OpenAI model gives an answer based on the statistical probability of which word should follow the previous one.
I strongly suggest you to read my previous answer regarding semantic search. You'll understand this answer better.