Import/Export object storage

You can import AWS S3 or GCP GCS to your VESSL Run workload. You can export your run results to them.

How it works

Common

  • To use private resources, you need to integrate your cloud provider. See Add Integrations.
  • You can apply the integrated secret when defining the import/export operation.

Import

  • Copies all the data under the bucket and path prefix that you configured into your workload’s specified path.
  • Fired when initializing the workload. If data size is large, it takes longer time than when the size of your data is smaller.

Export

  • Copies all the data from your workload’s specified path to your bucket’s specified path.
  • Fired when the workload successfully done its job. As import operation, it takes longer time depends on your data size.

Guide: Import/Export object storage

  1. Create New run by navigating to Your Project > Run and click New run.
  2. At the Task > Volumes section, click context menu and Import/Export > S3 / GCP. You can create a new directory for the import / export to work.
  • (Optional) To create a new directory
  • Import
  • Export
  1. Apply credential if you want to import/export a private object storage.
  • Example: Adding S3 integration for export
  1. Start a run and you can access the imported objects after initializing.

  2. Check your export target object storage after a successful run for your output data.

Mount Google Cloud Storage with FUSE on VESSL Run

This feature is currently on beta. It is only supported when you want to attach GCS on VESSL Managed GCP Clusters.

You can just mount the GCS with FUSE on your VESSL Run. You will be able to read/write as if filesystem.

The data is used only by your workload and the access is only alive for your VESSL Run’s lifecycle.

How to mount GCS on a new VESSL RUN

  1. Navigate to New run

  2. Fill out necessary fields. In this example, you can start from the following yaml.

Simple gcs-fuse mount example
ame: test-fuse
description: Test mounting your gcs fuse on our GCP managed cluster.
resources:
  cluster: vessl-gcp-oregon
  preset: cpu-small-spot
image: quay.io/vessl-ai/python:3.10-r18
mount:
  /my-gcs-fuse/:
    gcs_fuse:
      bucket: my-gcs-fuse
      path: /prefix
      # UPDATE BELOW: You shoud update the following `google_service_account` to yours.
      google_service_account: fuse@fuse-project-id.iam.gserviceaccount.com
    readonly: false
run: ls /my-gcs-fuse
ports: []
service_account_name: ""
termination_protection: false
  1. Select Resouces > Cluster to (gcp) vessl-gcp-oregon .

Using this feature on other clusters - managed AWS cluster, on-prem clusters - will be available soon.

  1. Create a new directory and Mount > GCS FUSE. The option should be enabled only if cluster is selected as (gcp) vessl-gcp-oregon .
  1. Follow the guide on the popup to create and bind your google service account with our VESSL managed GCP cluster.
  1. Add command to access your data in gcs.

In this example, just use ls /my-gcs-fuse to check the data inside.

Disclaimer: GCS I/O operations billing.

The billing for the GCS I/O operations are not covered by VESSL. It follows the corresponding pricing plan by Google Cloud Platform. Learn more on https://cloud.google.com/storage/#pricing

Limitations

  1. This feature is currently on beta. It is only supported when you want to attach GCS on VESSL Managed GCP Clusters.
  2. After a run is finished, the policy binding on your google service account to temporary k8s service account for VESSL managed GCP will become dangling reference. You might want to create a new google service account for the usage of VESSL Project and cleanup for each project.

More to know

Guide: Create your google service account and bind roles

You can follow the steps to create your google service account and bind to our VESSL managed GCP cluster.

  1. Sign up and create a project on GCP, https://cloud.google.com/cloud-console/

  2. Install gcloud CLI, https://cloud.google.com/sdk/docs/install

  3. Login CLI

gcloud auth login
  1. Get your project id and set your google service account name. This will make full service account name of $GSA_NAME@$GSA_PROJECT_ID.iam.gserviceaccount.com
  • Set Project ID.
# If you have already set up project ID
export GSA_PROJECT_ID=$(gcloud config get-value project)
# Or, you can list your project IDs and select one.
## list projects
gcloud projects list
## select id from list and export it.
export GSA_PROJECT_ID=<input-target-project-id>
  • Set service account name to create (ex. my-service-account).
export GSA_NAME=<YOUR_SERVICE_ACCOUNT_NAME>
  1. Create a service account
gcloud iam service-accounts create $GSA_NAME --project=$GSA_PROJECT_ID
  1. Add necessary roles to the service account
gcloud projects add-iam-policy-binding $GSA_PROJECT_ID \
  --member "serviceAccount:$GSA_NAME@$GSA_PROJECT_ID.iam.gserviceaccount.com" \
  --role roles/storage.objectAdmin
  1. Create a key file. You should provide KEY_FILE_NAME. This will generate and download KEY_FILE_NAME to your current folder.
gcloud iam service-accounts keys create <KEY_FILE_NAME> \
  --iam-account=$GSA_NAME@$GSA_PROJECT_ID.iam.gserviceaccount.com
  1. Add the generated service account json key on GCP integrations settings. It is located at organization > settings > integrations.

  2. Select the newly added GCP credential in the Mount GCS Fuse modal. Then, continue following the provided instructions to complete the setup process.