Using your GCP account, you can create a cluster and connect to VESSL’s backend.

In order to integrate VESSL, the following resources will be created:

  • GCS Bucket: A bucket for storing configuration, state, and data.
  • GKE Cluster: An GCP-managed Kubernetes cluster for running ML workloads.
  • GKE Node Pools: Autoscaling groups for selected resource types.

Step-by-Step Guide

1. Install Terraform and gcloud CLI

VESSL uses Terraform to add a GKE cluster, GKE node pools, and Kubernetes installations.

2. Configure cluster config

First, clone VESSL’s cloud integration terraform code from Github.

git clone https://github.com/vessl-ai/vessl-cloud-integration
cd vessl-cloud-integration/examples/gcp-gke-full

Using VESSL CLI, you can configure Terraform variables and the Terraform backend.

pip install vessl
vessl cluster create-config gcp

In your directory and in the bucket, two config files and a node group definition file will be generated.

  1. terraform.tfbackend: This file configures Terraform’s backend storage.
  2. terraform.tfvars: This file specifies the variables for your cluster configuration.
  3. nodes.tf: Thie file defines the node groups of your resource types

3. Applying terraform

To initialize your terraform state,

terraform init -backend-config="terraform.tfbackend"

The actual resources will be created by applying terraform.

terraform apply -var-file="terraform.tfvars"

The installation process takes about 20~30 minutes. While installing, please keep your internet connection on.

Once the cluster is installed, you can find it on the cluster page.

Destroy and delete the cluster

In order to destroy all resources created by VESSL, including the clusters, follow these steps:

terraform destroy -var-file="terraform.tfvars"

When the config file is missing in local, you can download it and start from scratch.

git clone https://github.com/vessl-ai/vessl-cloud-integration
cd vessl-cloud-integration/examples/gcp-gke-full
vessl cluster get-config [cluster_name]
terraform init -backend-config="terraform.tfbackend"
terraform destroy -var-file="terraform.tfvars"

After destroying a cluster, you can delete it from the cluster page.