Create a Kubernetes cluster
Prerequisites
- A Vantage account and organization.
- A configured Cloud Account for your chosen provider — see Compute Providers.
- AWS only: An SSH key pair created in the target AWS region. Vantage uses the key pair name when creating the control plane EC2 instance.
AWS
AWS K8s clusters use direct boto3 API calls (not CloudFormation) to provision infrastructure. Vantage creates the VPC, IAM roles, security groups, and launches a control plane EC2 instance with MicroK8s pre-configured.
-
Open Clusters — Click Clusters in the left sidebar (the Kubernetes view is shown by default), then click Prepare Cluster. A modal opens with the Configure step.
-
Configure the cluster — Enter a Cluster Name (max 27 characters, must be unique) and optional Description. Select your AWS Cloud Account, then click Continue. The Provider step opens.
-
Configure AWS resources — Set the Region (the dropdown loads after you select the cloud account). Click Select Machine to choose a Control Plane Machine Type — browse EC2 instance types by vCPU, GPU, and price. Select an SSH Key Name (the list loads after you pick a region; create a key pair in the AWS EC2 console first if empty). Click Prepare Cluster to submit.
JupyterHub and Grafana + Prometheus are enabled by default. See Integrations for details.
What Vantage provisions on AWS
| Resource | Details |
|---|---|
| VPC | 10.0.0.0/16 CIDR (auto-created if not provided) |
| Subnets | Public + private subnets |
| Internet Gateway + NAT Gateway | For outbound and inbound connectivity |
| Security groups | Default VPC security group for inter-node communication |
| IAM Role | vantage-{client_id}-node-role with EC2 trust policy |
| Instance Profile | vantage-{client_id}-instance-profile linked to the IAM role |
| IAM Policies | AmazonEBSCSIDriverPolicy, AmazonEFSCSIDriverPolicy, AmazonFSxFullAccess, plus a custom inline policy for EC2 Fleet management and launch template operations |
| Control plane EC2 instance | vantage-{client_id}-control-plane with Ubuntu 24.04, MicroK8s, LUKS encryption, Vault KMS, and Vantage connector |
| Launch Templates | Created by the autoscaler at runtime (one per compute pool) |
| EC2 Fleet instances | Tagged vantage-cluster={client_id}, managed by the autoscaler |
Azure
-
Open Clusters — Click Clusters in the left sidebar (the Kubernetes view is shown by default), then click Prepare Cluster.
-
Configure:
- Enter a Cluster Name and optional Description.
- Select your Azure Cloud Account.
noteThis provider uses backend defaults for provisioning. Review your cloud account configuration before submitting.
-
Submit — Click Create Cluster. Azure Kubernetes clusters use Vantage-managed defaults for node sizing and networking. Review your cloud account's regional quota before submitting.
GCP
-
Open Clusters — Click Clusters in the left sidebar (the Kubernetes view is shown by default), then click Prepare Cluster.
-
Configure:
- Enter a Cluster Name and optional Description.
- Select your GCP Cloud Account.
noteThis provider uses backend defaults for provisioning. Review your cloud account configuration before submitting.
-
Submit — Click Create Cluster. GCP Kubernetes clusters use Vantage-managed defaults. Verify your project's quota before submitting.
On-premises
On-premises Kubernetes clusters connect to Vantage through a lightweight connector deployed on your infrastructure. Vantage does not provision cloud resources — you provide the compute.
For the full setup guide, see On-Premises clusters and use the Kubernetes tab.
Multipass and Juju on-premises clusters only support Slurm, not Kubernetes. For Kubernetes on your own hardware, use the On-Premises clusters guide and choose the Kubernetes tab.
Slurm on Kubernetes
You can deploy a Slurm scheduler on top of an existing Kubernetes cluster (AWS only). This gives you HPC batch scheduling on cloud-native, auto-scaled infrastructure.
Prerequisites
- An existing Kubernetes cluster with Ready status — see AWS above.
From the Slurm list
-
Open Clusters — Click Clusters in the left sidebar, then click Slurm in the cluster type navigation, then click Prepare Cluster. A modal opens with the Configure step.
-
Select deployment target — Under Deployment Target, choose Kubernetes Cluster. A list of ready K8s clusters appears. Click the target cluster to select it, then click Configure Slurm Cluster. The Compute & Partitions step opens.
-
Configure compute pools and partitions:
Compute Pools Two compute pools are pre-configured — Slurm Controller (control plane) and Compute Workers. Compute pool names are auto-generated (e.g.,
slurm-control-{name}andslurm-compute-{name}-1).Field Default Notes Profile — Select a profile. No default — a selection is required. GPU No Toggle to enable GPU compute Min Nodes 1 Minimum 1 Max Nodes 1 (Control Plane) / 10 (Compute) The Profile field adapts based on the parent K8s cluster's provider:
- AWS parent — Opens an instance type browser dialog. Select any EC2 instance type (e.g.,
t3.medium,c5n.4xlarge).
Click + Add Compute Pool to add additional compute pools. At least one control plane pool and one compute pool are required.
Partitions A default partition named
partition-1is pre-configured. Set the Partition Name, choose which Compute Group it routes to, and toggle Default status. Only one partition can be default at a time.Click Advanced Options to configure:
- Expose Slurm services via NodePort
- TLS enabled (recommended) — enabled by default
- Job profiling (InfluxDB)
- K8s scheduler bridge — enabled by default
- AWS parent — Opens an instance type browser dialog. Select any EC2 instance type (e.g.,
-
Submit — Click Create Slurm Cluster. The wizard shows a progress stepper:
- Registering cluster — Creates the Slurm cluster record and provisions a Keycloak client.
- Creating compute pools — Provisions each compute pool sequentially on the parent K8s cluster (control plane, then compute pools) via vdeployer.
- Creating Slurm cluster — VDeployer triggers Helm chart installation.
From the Kubernetes detail page
- Open the target Kubernetes cluster — Click the cluster name to open its detail page.
- Open the Slurm Clusters tab — Click Slurm Clusters in the cluster detail tabs.
- Create a Slurm cluster — Click Create Slurm Cluster. A modal opens with the Configure step.
- Follow steps 3-4 above to configure compute pools, partitions, and submit.
The Slurm cluster enters preparing status and transitions to ready once all Slurm pods are running.
Non-AWS public cloud K8s clusters (Azure, GCP) do not support Slurm-on-Kubernetes. Only AWS parents can host a Slurm-on-K8s deployment.
What happens after submission
After submission, the cluster enters preparing status. The background thread handles all provisioning:
- STS AssumeRole (AWS only) — Assumes the IAM role from the cloud account for temporary credentials.
- Network setup — Creates VPC, subnets, and security groups (or validates existing ones).
- IAM resources (AWS only) — Creates instance roles and policies.
- Control plane launch — Creates the EC2 instance or VM with cloud-init that installs MicroK8s, LUKS encryption, and Vault KMS.
markClusterReady— Cloud-init calls this mutation when setup completes. The cluster transitions toready.- vdeployer deploy — Deploys autoscaler, tunnel client, and enabled integrations.
Poll the cluster status every 30-60 seconds. AWS provisioning typically takes 10-15 minutes.