Configure a training runtime

A training runtime is a pre-built environment that combines a deep learning framework, a container image, and a parallelism strategy. Runtimes are prerequisites for submitting training jobs.

Create and manage training runtimes that define the framework, image, and parallelism strategy for training jobs.

TimeAbout 5 minutes

You will needA Kubernetes cluster in ready status, admin permissions on the workspace

OutcomeA training runtime configured and ready for job submission

View runtimes

Open Workbench

Click Workbench in the left sidebar.

Navigate to Runtimes

Navigate to Train > Training > Runtimes.

Browse the list

The list shows each runtime with its name, framework, description, base image, and status.

Create a custom runtime

Open the creation form

On the Runtimes page, click Create Runtime.

Configure the runtime

Set the following fields:

Field	Required	What it does
Runtime name	required	A short identifier, for example `pytorch-distributed`.
Framework	required	The ML framework: `torch`, `deepspeed`, `mlx`, `huggingface`, or `custom`. Determines which ML policy fields are shown.
Base image	required	The container image URI, for example `docker.io/pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime`.
Number of nodes	optional	Number of nodes for multi-node training. Shown when a framework is selected.
Processes per node	optional	Enter `auto` or a positive integer. Available for torch, deepspeed, and huggingface frameworks.
NCCL config	optional	DeepSpeed only. `KEY=VALUE` pairs for NCCL configuration.
Compute pool	optional	Optional compute pool to constrain where the runtime runs.
Description	optional	Short description of this runtime.

Create

Click Create runtime.

Success looks like this: the runtime environment is configured, and training jobs can use the specified dependencies and framework versions.

Edit a runtime

Open the detail page

Click the runtime name to open the detail page.

Edit and save

Click Edit and modify the image, policy, or other settings. Click Save.

Set a default runtime

On the Runtimes page, click the Set as default action on a runtime row. The default runtime is pre-selected when creating new training jobs.

Runtimes are workspace-scoped. Administrators can publish runtimes that are available to all users in the workspace.

View runtimes​

Open Workbench​

Navigate to Runtimes​

Browse the list​

Create a custom runtime​

Open the creation form​

Configure the runtime​

Create​

Edit a runtime​

Open the detail page​

Edit and save​

Set a default runtime​

What to do next​