Skip to main content

Kubernetes clusters

Managed platform clusters for Workbench sessions, ML training, and containerized workloads.

Kubernetes clusters

Kubernetes clusters are Vantage's platform clusters — they run MicroK8s with a Vantage-managed control plane and power every higher-level Vantage product: Workbench sessions, training jobs, model endpoints, pipelines, and sweeps. They also serve as parent clusters for Slurm-on-Kubernetes deployments.

Vantage handles the full lifecycle: cloud provisioning (VPC, IAM, control plane instance), K8s installation, node autoscaling, and integration deployment. You interact with the cluster through Vantage — no kubectl or direct cluster access required for day-to-day use.

How it works

When you create a Kubernetes cluster, Vantage:

  1. Validates input — Checks cluster name, cloud account credentials, instance types, and subscription limits.
  2. Creates database records — Inserts the cluster record with status = preparing.
  3. Creates a Keycloak client — Registers an OAuth2 client for the cluster.
  4. Provisions infrastructure (background thread) — This step varies by provider:
    • AWS: Assumes the IAM role, creates VPC/subnets/security groups (or uses existing), creates IAM roles and instance profiles, launches a control plane EC2 instance with cloud-init that installs MicroK8s, LUKS encryption, and Vault KMS.
    • Azure / GCP: Uses Vantage-managed defaults for the control plane.
    • On-premises: No cloud provisioning — waits for the connector.
  5. Transitions to ready — The control plane's cloud-init script calls markClusterReady once MicroK8s, encryption, and Vault are set up.
  6. Deploys integrations — vdeployer-web deploys the cluster autoscaler, tunnel client, and any enabled integrations (JupyterHub, Grafana, Ray, MLflow).

Provider comparison

| Aspect | AWS | Azure / GCP | On-Premises | |---|---|---|---|---| | Control plane | EC2 instance (boto3) | Vantage-managed | Connector-based (Multipass and Juju are Slurm-only) | | Instance selection | EC2 type browser | Vantage-managed defaults | Your hardware or local VMs | | VPC / networking | VPC + subnets (auto or existing) | Vantage-managed | Your network | | GPU support | GPU instance types | GPU instance types | Your GPUs | | Custom networking | VPC, subnet, security group | No | No | | Slurm on K8s supported | Yes | No | No |

Multipass and Juju on-premises clusters only support Slurm, not Kubernetes. For on-premises Kubernetes, see On-Premises clusters and choose the Kubernetes tab.:::

Slurm on Kubernetes

You can deploy a Slurm scheduler on top of an existing Kubernetes cluster — combining HPC batch scheduling with cloud-native autoscaling. This is available for:

  • AWS K8s parent clusters — Compute pools use EC2 instance types selected from the instance browser. Non-AWS public cloud K8s clusters (Azure, GCP) do not currently support Slurm-on-Kubernetes.

For details, see creating a Slurm-on-Kubernetes cluster.

Next steps

Ask AI
Ask a question about Vantage Compute...