Skip to main content

Large Language Models with Ray Serve

Before you start

Prepare your environment for this section:

~$prepare-environment aiml/chatbot

This will make the following changes to your lab environment:

  • Installs Karpenter in the Amazon EKS cluster
  • Creates an IAM Role for the Pods to use

You can view the Terraform that applies these changes here.

Mistral 7B, a 7.3B parameter model, is a powerful language model. It represents a significant advancement in language model technology, combining powerful capabilities like text generation and completion, information extraction, data analysis, API interactions and complex reasoning tasks with practical efficiency.

This section will focus on gaining insights into the intricacies of deploying LLMs efficiently on EKS.

For deploying and scaling the model, this lab will utilize AWS Trainium through the Trn1 family. Model inference will utilize the Ray Serve project for building online inference APIs and streamlining the deployment of machine learning models.