LangChain ConversationalRetrievalQAChain: Full Response Instead of Tokens in Streaming

160 Views Asked by At

I'm working on a LangChain project using ConversationalRetrievalQAChain and OpenAI to achieve token-based incremental processing. My goal is to process each generated token as it's written for custom formatting and sentiment analysis. Unfortunately, I'm encountering an issue where I only receive the entire response at the end, instead of individual tokens as expected

const model = new OpenAI({
  ... // relevant options like temperature, modelName, etc.
  streaming: true,
  callbacks: [
    {
      handleLLMNewToken(token) {
        console.log("Expected to receive individual tokens here, but only getting full response object.");
      },
    },
  ],
});

    const chain = ConversationalRetrievalQAChain.fromLLM(
  model,
  vectorstore.asRetriever({ k: 6 }),
  ... // relevant chain options like qaTemplate and questionGeneratorTemplate
);

    const response = await chain.stream({ question: "Sample question", chat_history: 
    "Previous conversation..." });

for await (const data of responseStream) {
  console.log("Expected to iterate over individual tokens, but only receiving full 
 response object.");
}

Expected Behavior:

 I expect the handleLLMNewToken callback to log each generated token individually, and the for await loop to iterate over these individual tokens. This would allow me to process each token as it's written.

Observed Behavior:

However, the handleLLMNewToken callback only logs the complete response object, and the for await loop iterates over a single full response object, not individual tokens.enter code here

0

There are 0 best solutions below