Deploy LLM using Sagemaker and Langchain

893 Views Asked by akshat garg At 08 November 2023 at 10:09

I am trying to deploy Generative AI solution built using Langchain (obviously with LLM at it's core) and Sagemaker. So, the code is not just an inference script but inference pipeline (challenge is that this one will be using LLM). How can I achieve this? Also, I want to add streaming.

Original Q&A

There are 2 best solutions below

Gili Nachum On 09 November 2023 at 08:46

The usual architecture pattern is to separate the LLM from the client code (Langchain). Where the LLM is hosted in a SageMaker endpoint and the client is running in EC2, container or a Lambda function.
The advantages is much faster deployment (you'll update the app more often than the LLM), and an ability to scale out each of the components individually.
So, A much easier path to solution would be to deploy one of the LLMs available today in SageMaker Jumpstart (open-source or commercials), and deploy the application separately.

If you have good reasons to need full control of LLM, then you can try to build on this LLAMA2/SageMaker example (container, etc).

Then, if you want total control, you can build it all on top of your custom docker.

akshat garg On 16 November 2023 at 21:03

LLM's are huge and running in hundreds of GB. So, it is better to deploy the LLM's separately (since here we are trying to work in AWS, sagemaker endpoint makes sense) i.e. your app (using langchain) should call this endpoint (sagemaker endpoint within langchain) and consume predictions. Now, sagemaker endpoint cannot be simple sagemaker endpoint as some LLM's are huge and model optimization strategies have to be applied, with strong synergy between hardware and software is required. This is possible by the use of Large Model Inference Containers of Sagemaker. These containers run DJL serving+ Model Optimization Frameworks + LLM (Complete list here --> https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers). Without optimization, don't deploy LLM's. But before taking this path, do give a check into Jumpstart models list and Bedrock (will save you a lot of time).

Deploy LLM using Sagemaker and Langchain

There are 2 best solutions below

Related Questions in STREAMING

Related Questions in AMAZON-SAGEMAKER

Related Questions in ENDPOINT

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in GENERATIVE

Trending Questions

Popular # Hahtags

Popular Questions