Presets

Endpoint presets are reusable configurations that bundle compute, replica counts, and runtime selections. Instead of filling out the full endpoint form every time, pick a preset and the deployment form pre-fills around it.

The presets list

The list view shows every endpoint preset in your workspace. Each row displays:

Name: the preset's identifier.
Description: what this preset is for.
CPU / Memory / GPU: the compute shape.
Min / Max replicas: autoscaling bounds.
Runtimes: which inference runtimes this preset supports.

Click a preset to see its full configuration, then click Use this preset to open the endpoint deployment form with the preset's values pre-filled.

Creating a preset

Click Create preset to define a new endpoint configuration. You'll specify:

Field	Required	What it does
Name	required	Unique identifier for the preset.
Description	optional	What this preset is for.
CPU	required	CPU allocation (e.g. `2`, `4000m`).
Memory	required	Memory allocation (e.g. `8Gi`).
GPU count	optional	Number of GPUs. Zero for CPU-only.
Compute pool	optional	Target compute pool.
Min replicas	required	Minimum pod count. Set to 0 for scale-to-zero.
Max replicas	required	Maximum pod count.
Runtimes	optional	Which inference runtimes this preset supports.
Configurations	optional	Pod defaults to attach (env vars, secrets, init scripts).

Next steps

Deploying an endpoint: how to use a preset when deploying
Runtimes: the serving environments presets reference

The presets list​

Creating a preset​

Next steps​

The presets list

Creating a preset

Next steps