Compute pools

Compute pools are the Kubernetes equivalent of Slurm partitions, a pool of identically-sized machines that the cluster autoscaler scales up and down automatically. You define the minimum and maximum size when you create the pool; Vantage handles provisioning through the cloud provider.

Each compute pool is a billing unit, you pay for every provisioned node, even if it isn't running a workload.

Roles

Every compute pool has a role:

Role	Purpose
`control`	Cluster management, runs the K8s control plane components, etcd, and Vantage system services. Compute pool name typically `control-plane`.
`worker`	Compute, runs user workloads (Workbench sessions, training jobs, model serving). Compute pool name is user-defined.

A cluster must have at least one control plane pool and at least one worker pool.

Instance selection

How you select compute resources depends on the cloud provider:

AWS: Instance type browser

AWS compute pools use native EC2 instance types selected through the Vantage instance type browser:

Search by family (e.g., t3, c5n, p3, g5).
Filter by vCPU count, memory, and GPU count.
Select from a curated list of supported instance types.
Multiple instance types can be specified per pool, the autoscaler uses EC2 Fleet API for diversification and best availability.

AWS compute pool field	Notes
`instance_types`	List of EC2 instance types (e.g., `["t3.xlarge"]`). Multiple types enable Fleet-based diversification.
`allocation_strategy`	EC2 Fleet allocation strategy, `lowest-price` (default), `diversified`, or `capacity-optimized`.

Non-AWS: Pre-defined profiles

Azure and GCP compute pools use pre-defined profiles instead of instance types:

Profile	vCPU	Memory	Notes
Small	4	8 GiB	Lightweight workers
Medium	8	16 GiB	General purpose
Large	16	32 GiB	Memory-intensive

Labels and taints

Labels and taints control workload scheduling onto compute pools:

Labels: Key-value pairs attached to every node in the pool. Workloads use nodeSelector or affinity rules to target specific labels.
Taints: Prevent pods from scheduling onto a node unless they tolerate the taint. Used for GPU-only nodes, spot instances, or dedicated infrastructure.

Auto-injected labels

The autoscaler injects a vc.pool: <compute-pool-name> label on every node in the pool, enabling Slurm-on-Kubernetes pod scheduling affinity.

Common label patterns

Label	Value	Purpose
`node-role.kubernetes.io/control-plane`	`""`	Control plane nodes
`node-role.kubernetes.io/worker`	`""`	Worker nodes
`nvidia.com/gpu`	`"true"`	GPU-accelerated nodes
`workload-type`	`"ml-training"`	Training job affinity

Autoscaling

Every compute pool has autoscaling bounds:

Min size: The floor. Set to 0 to allow scale-to-zero when idle.
Max size: The ceiling. The autoscaler never provisions above this limit.

The autoscaler monitors pending pod capacity and scales up when workloads request more resources. It scales down when nodes are underutilized for a sustained period.

For AWS clusters, the autoscaler manages EC2 Fleet instances tagged with vantage-cluster={client_id}.

Best practices

Separate workload types into different compute pools: GPU training jobs and CPU preprocessing should use different pools so they don't compete for resources.
Set min sizes conservatively: Idle nodes cost money. Start with min=0 and adjust once you understand your workload patterns.
Use labels and taints for scheduling control: Mark GPU compute pools with nvidia.com/gpu: "true" so only GPU workloads land on them.
Multiple instance types improve availability: On AWS, specifying multiple instance types per pool gives EC2 Fleet more options during capacity constraints.

Roles​

Instance selection​

AWS: Instance type browser​

Non-AWS: Pre-defined profiles​

Labels and taints​

Auto-injected labels​

Common label patterns​

Autoscaling​

Best practices​