VESSL AI Documentation

Deploying machine learning (ML) models in production environments often requires meticulous planning to ensure smooth operation, high availability, and the ability to handle fluctuating demands. VESSL Service offers two modes to cater to different needs: Provisioned and Serverless.

Provisioned mode

VESSL Service acts as a robust platform for deploying models developed within VESSL, or even your custom models, as inference servers. Provisioned Mode is ideal for those who prefer direct control over their deployment environment with features such as:

Activity tracking: Monitor logs, system metrics, and model performance metrics.
Auto-scaling: Automatically adjust server size based on resource usage to handle increased demands.
Traffic management: Easily split traffic for Canary testing and gradually roll out new model versions without downtime.
Operational control: Extensive customization through YAML configurations for those who need precise control over their deployments.

What’s next in provisioned mode?

VESSL Service Quickstart

Get started with VESSL Service using Llama 3.1-8B and the latest vLLM.

Deploy with YAML

Explore comprehensive YAML configuration examples.

Serverless mode

Serverless Mode simplifies deployments by abstracting away the underlying server management, allowing you to focus solely on model deployment and scaling. It’s particularly beneficial for teams without deep backend management expertise or those seeking cost-efficiency:

Automatic scaling: Scale your models in real-time based on workload demands.
Cost-efficiency: Only pay for the resources you use with a pay-as-you-go pricing model.
Simplified deployment: Minimal configuration needed, making it accessible regardless of technical background.
High availability and resilience: Built-in mechanisms to ensure models are always operational and resilient to failures.

What’s next in serverless mode?

Enable Serverless Mode

Deploy Serverless mode using Text Generation Inference(TGI)

Deploy with YAML

Explore comprehensive YAML configuration examples.

Both modes of VESSL Service are designed to make the deployment of ML services reliable, adaptable, and capable of managing varying workloads efficiently. Whether you choose the granular control of Provisioned Mode or the streamlined simplicity of Serverless Mode, VESSL Service facilitates the easy rollout and scaling of your AI models.

Get Started

Compute

Resource

Admin

Private Hub

Pricing

Overview

Provisioned mode

What’s next in provisioned mode?

VESSL Service Quickstart

Deploy with YAML

Serverless mode

What’s next in serverless mode?

Enable Serverless Mode

Deploy with YAML

Get Started

Compute

Resource

Admin

Private Hub

Pricing

​Provisioned mode

​What’s next in provisioned mode?

VESSL Service Quickstart

Deploy with YAML

​Serverless mode

​What’s next in serverless mode?

Enable Serverless Mode

Deploy with YAML

Provisioned mode

What’s next in provisioned mode?

Serverless mode

What’s next in serverless mode?