Launch batch jobs on GPUs
Leverge the power of GPUs to efficiently train batch runs
Batch Run
Batch runs are designed to execute a series of commands defined in your YAML configuration and then terminate. Batch job is suitable for large-scale, long-running tasks. These tasks are powered by the robustness of GPU capabilities, which significantly hasten model training times.
A Simple Batch Run
Here is an example of a simple batch run YAML configuration. It specifies Docker image to be used, the resource required for the run, and the commands to be exectued during the run.
In this example, the resources.preset=v1.v100-1.mem-52
will request a V100 GPU instance. Next, the nvidia-smi
command will be executed to display the
NVIDIA system management inteface and then terminate the run.
Termination Protection
You can also define termination protection in a batch run. Termination protection keeps your run active for a specified duration even after your commands have finished executing. This can be usefrul for debugging or retrieving intermediate files.
In this example, the termination_protect
will protect the container termination after running nvidia-smi
command.
Train a Thin-Plate Spline Motion Model with GPU resource
Now let’s dive in more complex batch run configuration. This configuration file describes a batch run for training a Thin-Plate Spline Motion Model utilizing a V100 GPU.
In this batch run, the Docker image nvcr.io/nvidia/pytorch:21.05-py3
is used, and a V100 GPU (resources.preset=v1.v100-1.mem-52
) is allocated for the run. This will ensure that the training job runs on top of the V100 GPU.
The model and scripts used in this run are fetched from a Github repository (/root/examples: git://github.com/vessl-ai/examples
).
The commands executed in the run first install the requriements, and train the model using the run.py
script.
This example demonstrates how you can set up a batch run for GPU-backed training a machine learning model with a single YAML configuration.
What’s Next
For more advanced configurations and examples. please visit VESSL Hub.
VESSL Hub
A variatey of YAML examples that you can use as references