Presets
Endpoint presets are reusable configurations that bundle compute, replica counts, and runtime selections. Instead of filling out the full endpoint form every time, pick a preset and the deployment form pre-fills around it.
The presets list
The list view shows every endpoint preset in your workspace. Each row displays:
- Name — the preset's identifier.
- Description — what this preset is for.
- CPU / Memory / GPU — the compute shape.
- Min / Max replicas — autoscaling bounds.
- Runtimes — which inference runtimes this preset supports.
Click a preset to see its full configuration, then click Use this preset to open the endpoint deployment form with the preset's values pre-filled.
Creating a preset
Click Create preset to define a new endpoint configuration. You'll specify:
| Field | Required | What it does |
|---|---|---|
| Name | required | Unique identifier for the preset. |
| Description | optional | What this preset is for. |
| CPU | required | CPU allocation (e.g. 2, 4000m). |
| Memory | required | Memory allocation (e.g. 8Gi). |
| GPU count | optional | Number of GPUs. Zero for CPU-only. |
| Compute pool | optional | Target compute pool. |
| Min replicas | required | Minimum pod count. Set to 0 for scale-to-zero. |
| Max replicas | required | Maximum pod count. |
| Runtimes | optional | Which inference runtimes this preset supports. |
| Configurations | optional | Pod defaults to attach (env vars, secrets, init scripts). |
Next steps
- Deploying an endpoint — how to use a preset when deploying
- Runtimes — the serving environments presets reference