Skip to main content

Create a Slurm-on-Kubernetes cluster

Deploy a Slurm HPC cluster on top of an existing Kubernetes cluster.

Create a Slurm-on-Kubernetes cluster

This guide walks through deploying a Slurm cluster on top of an existing Kubernetes cluster. Slurm-on-Kubernetes lets you run traditional HPC batch jobs using Kubernetes compute pools as the backing infrastructure.

Prerequisites

  • A Kubernetes cluster in ready status
  • At least one compute pool on the parent Kubernetes cluster

Create from the Slurm cluster list

  1. Click Clusters in the left sidebar, then click the Slurm tab.
  2. Click Prepare Cluster.
  3. In the Configure step, select Kubernetes Cluster as the deployment target. A list of ready Kubernetes clusters appears.
  4. Click the target cluster, then click Configure Slurm Cluster.

Create from the Kubernetes detail page

  1. Click the target Kubernetes cluster name to open its detail page.
  2. Click the Slurm Clusters tab.
  3. Click Create Slurm Cluster.

Configure compute pools and partitions

From either entry point, the wizard opens with the Compute & Partitions step.

Compute pools

Two pools are pre-configured: Slurm Controller (control plane) and Compute Workers.

For each pool, configure:

  1. Profile -- select an instance type:
    • AWS parent cluster -- click to open the instance type browser and select any EC2 type (for example, t3.medium or c5n.4xlarge)
    • Non-AWS parent cluster -- choose from Small (4 vCPU, 8 GiB), Medium (8 vCPU, 16 GiB), or Large (16 vCPU, 32 GiB)
  2. GPU -- toggle to enable GPU compute on the pool
  3. Max Nodes -- the upper autoscaling bound (default: 1 for controller, 10 for compute)

Click Add Compute Pool to add additional worker pools. At least one controller pool and one compute pool are required.

Partitions

A default partition named partition-1 is pre-configured. For each partition:

  1. Set the Partition Name.
  2. Choose which Compute Group it routes to.
  3. Toggle Default status. Only one partition can be default.

Advanced options

Click Advanced Options to configure TLS, NodePort exposure, job profiling, and the Kubernetes scheduler bridge.

Submit the cluster

Click Create Slurm Cluster. The wizard shows a three-stage progress stepper:

  1. Registering cluster -- creates the cluster record and provisions a Keycloak client
  2. Creating compute pools -- provisions each pool on the parent Kubernetes cluster
  3. Creating Slurm cluster -- finalizes the Slurm deployment

Wait for all stages to complete. The cluster appears in the Slurm list with ready status.

Submit a test job

  1. Navigate to Jobs > Scripts and create a simple test script.
  2. Click Actions > Submit Script.
  3. In the Cluster dropdown, select the new Slurm-on-Kubernetes cluster.
  4. Select a partition and click Submit.
  5. Verify the job completes on the Submissions page.
Ask AI
Ask a question about Vantage Compute...