Quickstart
Installation
You can install VESSL CLI through pip
.
pip install --upgrade vessl
Train nanoGPT with VESSL Run
To help you get started, we prepared a quickstart command that holds several example YAML files for popular open-source models on GitHub. The following command prompts a list of example models. At this step, you will be asked to log in and grant access permission.
vessl hello
Select nanogpt
from a list of models. This initiates a VESSL Run with the following nanogpt.yaml
file, which you can also check on your terminal as the Run starts.
name: nanogpt
image: nvcr.io/nvidia/pytorch:22.03-py3
resources:
cluster: aws-apne2
preset: v1.v100-1.mem-52
import:
/root/examples: git://github.com/vessl-ai/examples
export:
/output: vessl-artifact://
run:
- workdir: /root/examples/nanogpt
command: |
pip install torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install transformers datasets tiktoken wandb tqdm
python data/shakespeare_char/prepare.py
python train.py config/train_shakespeare_char.py
python sample.py --out_dir=out-shakespeare-char
The command performs the following as defined in the YAML file:
- Launch a training job & cluster on AWS with 1 NVIDIA V100 GPU.
- Configure runtime with CUDA compute-capable PyTorch 22.03.
- Mount the nanoGPT GitHub repo and set the working directory.
- Run the task’s run commands defined under
command
. - Track training progress on VESSL.
Click the output link in your terminal to check the training progress for the Run along with the key metrics and hyperparameters.
You can also launch the same Run by copying and pasting the YAML above and running the following command.
vessl run -f nanogpt.yaml
What’s next
Run’s unified YAML interface really shines as you (1) fine-tune a model with your dataset, (2) scale it on your cloud or on-prems, (3) and create a micro AI/ML app. Follow the guides below to experiment with popular models like Dreambooth Stable Diffusion and Segment Anything using VESSL Run.
Run a GPU-backed training job
Leverges the power of GPUs to efficiently train batch run.
Run a GPU-backed Jupyter and SSH server
Enable a real-time session of interacitve run on GPUs.
Backup and Restore Data with VESSL Artifact
Run, Backup, Repeat: VESSL Run with VESSL Artifact.
Dataset for a Run
Multiple ways to configure dataset.