Skip to main content

Slurm clusters

Traditional HPC batch scheduling on Vantage.

Slurm clusters

Slurm is a workload manager for high-performance computing — batch job scheduling, parallel job execution, and queue management. Vantage provisions Slurm clusters on your choice of infrastructure: public cloud (AWS, Azure, GCP), cost-efficient GPU cloud (Cudo Compute), partner HPC providers, or your own hardware.

Vantage handles all the Slurm controller setup, node registration, and autoscaling. You interact with the cluster through the Vantage UI, CLI, SDK, or API — no direct SSH access to the head node required for day-to-day use.

How it works

When you create a Slurm cluster, Vantage:

  1. Validates your input and checks your subscription limits.
  2. Creates a Keycloak OAuth2 client for the cluster (used for authentication by Vantage Agent and integrations).
  3. Inserts the cluster record and partition configuration.
  4. Provisions infrastructure on your chosen provider — CloudFormation stack on AWS, direct boto3 API calls on other clouds, or no cloud provisioning for on-premises.
  5. Registers nodes as they come online via Vantage Agent.
  6. Transitions the cluster to ready once all nodes are registered and Slurm configuration is uploaded.

Provisioning is asynchronous. The cluster enters preparing status immediately and transitions to ready or failed once infrastructure is set up.

Provider comparison

AspectAWSAzure / GCPCudo ComputeOn-premises / LXD
ProvisioningCloudFormationVantage-managed defaultsVantage-managed defaultsAgent-based (you provide infrastructure)
Instance selectionEC2 instance type browserVantage-managed defaultsVantage-managed defaultsYour existing hardware
PartitionsConfigured during creationConfigured post-creationConfigured post-creationConfigured post-creation
SSH key requiredYes (EC2 key pair name)NoNoNo
Custom networkingVPC, subnet, security groupNoNoNo

Next steps

Ask AI
Ask a question about Vantage Compute...