Deploying machine learning (ML) models in production environments often requires meticulous planning to ensure smooth operation, high availability, and the ability to handle fluctuating demands. VESSL Service offers two modes to cater to different needs: Provisioned and Serverless.
VESSL Service acts as a robust platform for deploying models developed within VESSL, or even your custom models, as inference servers. Provisioned Mode is ideal for those who prefer direct control over their deployment environment with features such as:
Get started with VESSL Service using Llama 3.1-8B and the latest vLLM.
Explore comprehensive YAML configuration examples.
Serverless Mode simplifies deployments by abstracting away the underlying server management, allowing you to focus solely on model deployment and scaling. It’s particularly beneficial for teams without deep backend management expertise or those seeking cost-efficiency:
Deploy Serverless mode using Text Generation Inference(TGI)
Explore comprehensive YAML configuration examples.
Both modes of VESSL Service are designed to make the deployment of ML services reliable, adaptable, and capable of managing varying workloads efficiently. Whether you choose the granular control of Provisioned Mode or the streamlined simplicity of Serverless Mode, VESSL Service facilitates the easy rollout and scaling of your AI models.