Partitions

Partitions are job queues on a Slurm cluster. Each partition targets a pool of nodes with specific characteristics — instance type, GPU count, max run time — and applies rules about who can submit jobs and at what priority.

When you submit a job through Vantage, you name a partition and Slurm places it on a qualifying node within that partition.

How partitions work

Think of a partition as a named queue that routes jobs to a specific pool of compute resources:

Slurm concept	Maps to
`PartitionName`	Your partition name (e.g., `gpu`, `cpu`, `high-mem`)
`Default`	Whether this is the default partition for jobs that don't specify one
`MaxTime`	Maximum wall time for jobs in this partition (in seconds)
Nodes	All instances in the compute node pool assigned to this partition

Cloud Slurm partitions

For cloud-provisioned Slurm clusters (AWS), you configure partitions during cluster creation. Each partition:

Gets a unique Partition Name.
Targets a Compute Node instance type selected through the instance browser.
Has a Maximum node count that Vantage's autoscaler uses as a ceiling.

Creating partitions during setup

In the wizard's partition step:

A default partition named compute is pre-filled.
Click Select Compute Node to choose the instance type for worker nodes in that partition.
Set the Maximum node count — the autoscaler scales up to this limit when jobs queue up.
Click Add Partition to create additional partitions (e.g., a gpu partition with p3.2xlarge nodes alongside a cpu partition with t3.large nodes).

Editing partitions post-creation

Open the cluster detail page and go to the Partitions tab.
Click Edit to change the instance type or max node count of an existing partition.
Click Add Partition to create a new one.

Edits are applied live. The autoscaler adjusts node counts based on the new limits.

tip

Start with a low max node count per partition. You can raise it later as workload demands grow. Idle provisioned nodes bill at full rate.

Non-AWS Slurm partitions

For Slurm clusters on Azure, GCP, Cudo Compute, or on-premises, partitions are managed post-creation from the cluster detail page. These providers use Vantage-managed defaults for node sizing and networking — the partition interface is simpler.

Autoscaling

Cloud Slurm partitions use Vantage's autoscaler to manage node counts:

When jobs enter a partition's queue, the autoscaler provisions nodes up to the partition's max.
When the queue drains and nodes are idle, the autoscaler terminates nodes down to the partition's min (or zero, if min is 0).
The autoscaler respects the instance type configured for each partition — a gpu partition only provisions GPU instances.

Best practices

Separate workloads by resource profile — Create CPU and GPU partitions so batch preprocessing doesn't block GPU nodes for training.
Set max node counts conservatively — Idle nodes cost money. Start low and raise when you've validated your workload patterns.
Use the default partition — Mark one partition as default so jobs that don't specify a partition still get scheduled.

How partitions work​

Cloud Slurm partitions​

Creating partitions during setup​

Editing partitions post-creation​

Non-AWS Slurm partitions​

Autoscaling​

Best practices​