Skip to main content

Slurm clusters

Traditional HPC batch scheduling on Vantage.

Slurm clusters

Slurm is a workload manager for high-performance computing — batch job scheduling, parallel job execution, and queue management. Vantage provisions Slurm clusters on your choice of infrastructure: public cloud (AWS, Azure, GCP), partner HPC providers, or your own hardware.

Vantage handles all the Slurm controller setup, node registration, and autoscaling. You interact with the cluster through the Vantage UI, CLI, SDK, or API — no direct SSH access to the head node required for day-to-day use.

How it works

When you create a Slurm cluster, Vantage:

  1. Validates your input and checks your subscription limits.
  2. Creates a Keycloak OAuth2 client for the cluster (used for authentication by the Vantage connector and integrations).
  3. Inserts the cluster record and partition configuration.
  4. Provisions infrastructure on your chosen provider — CloudFormation stack on AWS, direct boto3 API calls on other clouds, or no cloud provisioning for on-premises.
  5. Registers nodes as they come online via the Vantage connector.
  6. Transitions the cluster to ready once all nodes are registered and Slurm configuration is uploaded.

Provisioning is asynchronous. The cluster enters preparing status immediately and transitions to ready or failed once infrastructure is set up.

Provider comparison

| Aspect | AWS | Azure / GCP | On-Premises | |---|---|---|---|---| | Provisioning | CloudFormation | Vantage-managed defaults | Ansible, Terraform, manual, Multipass, or Juju (you provide infrastructure) | | Instance selection | EC2 instance type browser | Vantage-managed defaults | Your hardware or local VMs | | Partitions | Configured during creation | Configured post-creation | Configured post-creation | | SSH key required | Yes (EC2 key pair name) | No | No | | Custom networking | VPC, subnet, security group | No | No |

On-premises clusters can be created through the web UI (manual), Ansible, Terraform, or the Vantage CLI (Multipass and Juju). See On-Premises clusters for details.:::

Next steps

Ask AI
Ask a question about Vantage Compute...