Skip to main content

Create a Slurm cluster

Step-by-step guides for creating Slurm clusters on every supported provider.

Create a Slurm cluster

Prerequisites

Before creating a Slurm cluster, you need:

  • A Vantage account and organization.
  • A configured Cloud Account for your chosen provider — see Compute Providers.
  • AWS only: An SSH key pair created in the target AWS region. Vantage uses the key pair name to provision cluster nodes — it never receives the private key.

AWS

The most common Slurm path. Vantage uses CloudFormation to provision a VPC, Auto Scaling groups, and IAM roles, then installs Slurm on the controller and worker nodes.

  1. Open Clusters — Click Clusters in the left sidebar, then click Prepare Cluster.

  2. Choose type — Select Slurm and click Continue.

  3. Configure the cluster:

    • Enter a Cluster Name (max 27 characters, must be unique). The name is used as the CloudFormation stack name.
    • Select your AWS Cloud Account. The provider is detected automatically.
    • Pick a Region — the dropdown loads after you select the cloud account.
    • The Head Node Machine Type auto-fills a default — click Select Head Node to browse by vCPU, GPU, and price.
    • Select an SSH Key Name — the list loads after you pick a region. If it's empty, create a key pair in the AWS EC2 console first.
  4. Networking (optional) — Click Advanced Options to pin the cluster to a specific VPC, Head Node Subnet, and Compute Node Subnet. Leave these empty to use AWS-managed defaults (Vantage creates a VPC, public and private subnets, Internet Gateway, NAT Gateway, and security groups automatically).

  5. Set partitions — A default partition named compute is pre-filled. For each partition:

    • Give it a Partition Name.
    • Click Select Compute Node to choose the instance type for worker nodes.
    • Set the Maximum node count — Vantage scales up to this limit when jobs are waiting.
    • Click Add Partition to create additional partitions for different workload types (e.g., a GPU partition alongside a CPU partition).
  6. Submit — Click Prepare Cluster. Vantage generates a CloudFormation template and creates the stack. Provisioning typically takes a few minutes.

What Vantage provisions on AWS

ResourceDetails
VPC10.0.0.0/16 CIDR (only created if VPC not provided)
SubnetsPublic + private subnets
Internet GatewayFor public subnet outbound
NAT GatewayFor private subnet outbound
Security groupsSlurm inter-node communication
IAM instance profilesGrant nodes access to assume the cluster role
EC2 Auto Scaling groupWorker nodes with configured instance type and limits
Slurm controllerAlways-on head node (EC2 instance)

Azure

  1. Open Clusters — Click Clusters, then Prepare Cluster.

  2. Choose type — Select Slurm and click Continue.

  3. Configure the cluster:

    • Enter a Cluster Name (max 27 characters, must be unique).
    • Select your Azure Cloud Account.
  4. Submit — Click Prepare Cluster. Azure Slurm clusters use Vantage-managed defaults for node configuration and networking. Partitions are configured post-creation from the Partitions tab on the cluster detail page.

GCP

  1. Open Clusters — Click Clusters, then Prepare Cluster.

  2. Choose type — Select Slurm and click Continue.

  3. Configure the cluster:

    • Enter a Cluster Name (max 27 characters, must be unique).
    • Select your GCP Cloud Account.
  4. Submit — Click Prepare Cluster. GCP Slurm clusters use Vantage-managed defaults for node configuration and networking. Partitions are configured post-creation from the cluster detail page.

Cudo Compute

  1. Open Clusters — Click Clusters, then Prepare Cluster.

  2. Choose type — Select Slurm and click Continue.

  3. Configure the cluster:

    • Enter a Cluster Name (max 27 characters, must be unique).
    • Select your Cudo Compute Cloud Account.
  4. Submit — Click Prepare Cluster. Cudo Slurm clusters use Vantage-managed defaults for node configuration and networking. Partitions are configured post-creation.

On-premises / LXD

On-premises clusters connect through a lightweight agent deployed on your infrastructure. Vantage does not provision cloud resources — you provide the compute.

  1. Open Clusters — Click Clusters, then Prepare Cluster.

  2. Choose type — Select Slurm and click Continue.

  3. Configure:

    • Enter a Cluster Name.
    • Select your On-Premises or LXD cloud account.
  4. Get the agent command — The wizard shows a Vantage Agent installation command. Copy it.

  5. Install the agent — Run the installation command on your cluster's head node (or multiple nodes). The agent establishes an outbound HTTPS connection to Vantage — no inbound firewall rules required.

  6. Watch it connect — The cluster flips to ready once the agent is reporting. Nodes appear in the detail page as they register.

The agent only needs outbound HTTPS access to Vantage servers (port 443). If your cluster is behind a firewall, ensure outbound connectivity is not blocked.

What happens after submission

After you submit the creation form, the cluster immediately enters preparing status. The exact provisioning steps depend on the provider:

  • AWS — Vantage generates a CloudFormation template and calls create_stack. The stack provisions VPC, subnets, IAM roles, and EC2 instances asynchronously. Once the head node boots, Vantage Agent registers the node and uploads the Slurm configuration. The cluster transitions to ready.
  • Non-AWS cloud — Vantage provisions infrastructure through your provider's API. The cluster transitions to ready once provisioning completes.
  • On-premises — Vantage creates the database record and waits for the agent to connect. The cluster transitions to ready when the agent first phones home.
tip

Poll the cluster status from the Clusters list or via the API. Start with low max node counts — you can raise them later from the Partitions tab. Idle provisioned nodes bill at full rate.

Ask AI
Ask a question about Vantage Compute...