Skip to main content

Create a Kubernetes cluster

Step-by-step guides for creating Kubernetes clusters on every supported provider, including Slurm on Kubernetes.

Create a Kubernetes cluster

Prerequisites

  • A Vantage account and organization.
  • A configured Cloud Account for your chosen provider — see Compute Providers.
  • AWS only: An SSH key pair created in the target AWS region. Vantage uses the key pair name when creating the control plane EC2 instance.

AWS

AWS K8s clusters use direct boto3 API calls (not CloudFormation) to provision infrastructure. Vantage creates the VPC, IAM roles, security groups, and launches a control plane EC2 instance with MicroK8s pre-configured.

  1. Open Clusters — Click Clusters in the left sidebar (the Kubernetes view is shown by default), then click Prepare Cluster. A modal opens with the Configure step.

  2. Configure the cluster — Enter a Cluster Name (max 27 characters, must be unique) and optional Description. Select your AWS Cloud Account, then click Continue. The Provider step opens.

  3. Configure AWS resources — Set the Region (the dropdown loads after you select the cloud account). Click Select Machine to choose a Control Plane Machine Type — browse EC2 instance types by vCPU, GPU, and price. Select an SSH Key Name (the list loads after you pick a region; create a key pair in the AWS EC2 console first if empty). Click Prepare Cluster to submit.

    JupyterHub and Grafana + Prometheus are enabled by default. See Integrations for details.

What Vantage provisions on AWS

ResourceDetails
VPC10.0.0.0/16 CIDR (auto-created if not provided)
SubnetsPublic + private subnets
Internet Gateway + NAT GatewayFor outbound and inbound connectivity
Security groupsDefault VPC security group for inter-node communication
IAM Rolevantage-{client_id}-node-role with EC2 trust policy
Instance Profilevantage-{client_id}-instance-profile linked to the IAM role
IAM PoliciesAmazonEBSCSIDriverPolicy, AmazonEFSCSIDriverPolicy, AmazonFSxFullAccess, plus a custom inline policy for EC2 Fleet management and launch template operations
Control plane EC2 instancevantage-{client_id}-control-plane with Ubuntu 24.04, MicroK8s, LUKS encryption, Vault KMS, and Vantage connector
Launch TemplatesCreated by the autoscaler at runtime (one per compute pool)
EC2 Fleet instancesTagged vantage-cluster={client_id}, managed by the autoscaler

Azure

  1. Open Clusters — Click Clusters in the left sidebar (the Kubernetes view is shown by default), then click Prepare Cluster.

  2. Configure:

    • Enter a Cluster Name and optional Description.
    • Select your Azure Cloud Account.
    note

    This provider uses backend defaults for provisioning. Review your cloud account configuration before submitting.

  3. Submit — Click Create Cluster. Azure Kubernetes clusters use Vantage-managed defaults for node sizing and networking. Review your cloud account's regional quota before submitting.

GCP

  1. Open Clusters — Click Clusters in the left sidebar (the Kubernetes view is shown by default), then click Prepare Cluster.

  2. Configure:

    • Enter a Cluster Name and optional Description.
    • Select your GCP Cloud Account.
    note

    This provider uses backend defaults for provisioning. Review your cloud account configuration before submitting.

  3. Submit — Click Create Cluster. GCP Kubernetes clusters use Vantage-managed defaults. Verify your project's quota before submitting.

On-premises

On-premises Kubernetes clusters connect to Vantage through a lightweight connector deployed on your infrastructure. Vantage does not provision cloud resources — you provide the compute.

For the full setup guide, see On-Premises clusters and use the Kubernetes tab.

note

Multipass and Juju on-premises clusters only support Slurm, not Kubernetes. For Kubernetes on your own hardware, use the On-Premises clusters guide and choose the Kubernetes tab.

Slurm on Kubernetes

You can deploy a Slurm scheduler on top of an existing Kubernetes cluster (AWS only). This gives you HPC batch scheduling on cloud-native, auto-scaled infrastructure.

Prerequisites

  • An existing Kubernetes cluster with Ready status — see AWS above.

From the Slurm list

  1. Open Clusters — Click Clusters in the left sidebar, then click Slurm in the cluster type navigation, then click Prepare Cluster. A modal opens with the Configure step.

  2. Select deployment target — Under Deployment Target, choose Kubernetes Cluster. A list of ready K8s clusters appears. Click the target cluster to select it, then click Configure Slurm Cluster. The Compute & Partitions step opens.

  3. Configure compute pools and partitions:

    Compute Pools Two compute pools are pre-configured — Slurm Controller (control plane) and Compute Workers. Compute pool names are auto-generated (e.g., slurm-control-{name} and slurm-compute-{name}-1).

    FieldDefaultNotes
    ProfileSelect a profile. No default — a selection is required.
    GPUNoToggle to enable GPU compute
    Min Nodes1Minimum 1
    Max Nodes1 (Control Plane) / 10 (Compute)

    The Profile field adapts based on the parent K8s cluster's provider:

    • AWS parent — Opens an instance type browser dialog. Select any EC2 instance type (e.g., t3.medium, c5n.4xlarge).

    Click + Add Compute Pool to add additional compute pools. At least one control plane pool and one compute pool are required.

    Partitions A default partition named partition-1 is pre-configured. Set the Partition Name, choose which Compute Group it routes to, and toggle Default status. Only one partition can be default at a time.

    Click Advanced Options to configure:

    • Expose Slurm services via NodePort
    • TLS enabled (recommended) — enabled by default
    • Job profiling (InfluxDB)
    • K8s scheduler bridge — enabled by default
  4. Submit — Click Create Slurm Cluster. The wizard shows a progress stepper:

    1. Registering cluster — Creates the Slurm cluster record and provisions a Keycloak client.
    2. Creating compute pools — Provisions each compute pool sequentially on the parent K8s cluster (control plane, then compute pools) via vdeployer.
    3. Creating Slurm cluster — VDeployer triggers Helm chart installation.

From the Kubernetes detail page

  1. Open the target Kubernetes cluster — Click the cluster name to open its detail page.
  2. Open the Slurm Clusters tab — Click Slurm Clusters in the cluster detail tabs.
  3. Create a Slurm cluster — Click Create Slurm Cluster. A modal opens with the Configure step.
  4. Follow steps 3-4 above to configure compute pools, partitions, and submit.

The Slurm cluster enters preparing status and transitions to ready once all Slurm pods are running.

tip

Non-AWS public cloud K8s clusters (Azure, GCP) do not support Slurm-on-Kubernetes. Only AWS parents can host a Slurm-on-K8s deployment.

What happens after submission

After submission, the cluster enters preparing status. The background thread handles all provisioning:

  1. STS AssumeRole (AWS only) — Assumes the IAM role from the cloud account for temporary credentials.
  2. Network setup — Creates VPC, subnets, and security groups (or validates existing ones).
  3. IAM resources (AWS only) — Creates instance roles and policies.
  4. Control plane launch — Creates the EC2 instance or VM with cloud-init that installs MicroK8s, LUKS encryption, and Vault KMS.
  5. markClusterReady — Cloud-init calls this mutation when setup completes. The cluster transitions to ready.
  6. vdeployer deploy — Deploys autoscaler, tunnel client, and enabled integrations.
tip

Poll the cluster status every 30-60 seconds. AWS provisioning typically takes 10-15 minutes.

Ask AI
Ask a question about Vantage Compute...