Concepts

Cluster types

Vantage supports three cluster types. Your choice determines the scheduler, the detail tabs, and which workloads you can run.

Type	Scheduler	Best for
Slurm	Slurm (batch jobs)	Traditional HPC — simulations, MPI workloads, batch pipelines
Kubernetes	Kubernetes	Workbench sessions, ML training, containerized apps
Slurm on Kubernetes	Slurm inside K8s	HPC workloads on cloud-native, auto-scaled infrastructure

Slurm and Slurm-on-Kubernetes clusters appear under the Slurm list in the sidebar. Kubernetes clusters appear under Kubernetes.

Status lifecycle

Every cluster moves through the same set of phases, regardless of type or provider:

Status	Meaning
Preparing	Infrastructure provisioning in progress. Vantage is creating cloud resources, installing software, and waiting for the cluster to phone home.
Ready	Cluster is connected and accepting workloads.
Failed	Provisioning encountered an error. Check the detail page for status details.
Deleting	Vantage is tearing down infrastructure.

Provisioning time varies by provider and cluster type:

AWS Slurm — A few minutes (CloudFormation stack)
AWS K8s — 10-15 minutes (boto3 provisioning + cloud-init)
Cudo K8s — 10-25 minutes (VM provisioning + cloud-init)
Azure / GCP — Varies by region and quota
On-premises — Immediate after agent connects (infrastructure is yours)

Partitions and node groups

Partitions and node groups are how you organize compute resources:

Partitions (Slurm) — Job queues with rules for max run time, allowed users, and priority. Each partition targets a pool of nodes.
Node groups (Kubernetes) — Pools of identically-sized machines that the cluster autoscaler manages. Equivalent to partitions in purpose, but K8s-native.

Both concepts let you isolate workloads by resource requirements — for example, a GPU partition for ML training and a CPU partition for batch preprocessing.

Cost

Every provisioned node accumulates spend regardless of utilization. Cloud clusters with autoscaling can scale down to zero idle nodes — but only if the node group or partition minimum is set to zero.

The Monitoring tab on every cluster detail page shows live utilization and accumulated cost. Idle GPU nodes are the most common preventable expense.

Compute providers

Providers are the physical infrastructure Vantage provisions clusters on.

Provider	What it's for
Public clouds (AWS, Azure, GCP)	Elastic capacity, global regions, spot pricing
Cudo Compute	Cost-efficient GPU cloud
On-premises / LXD	Your own hardware, maximum control
Vantage partners (atNorth, BuzzHPC, RCI)	Pre-integrated managed colocation and HPC

Regions and availability

Cloud clusters run in the region you select during creation. Slurm clusters can span multiple availability zones within a region. On-premises clusters report their location as configured by your admin.

Some providers (Cudo Compute) allow per-node-group data center selection, enabling geo-distributed worker pools.

Cluster types​

Status lifecycle​

Partitions and node groups​

Cost​

Compute providers​

Regions and availability​