Skip to main content

Presets

Pre-configured endpoint defaults for quick deployment.

Presets

Endpoint presets are reusable configurations that bundle compute, replica counts, and runtime selections. Instead of filling out the full endpoint form every time, pick a preset and the deployment form pre-fills around it.

The presets list

The list view shows every endpoint preset in your workspace. Each row displays:

  • Name — the preset's identifier.
  • Description — what this preset is for.
  • CPU / Memory / GPU — the compute shape.
  • Min / Max replicas — autoscaling bounds.
  • Runtimes — which inference runtimes this preset supports.

Click a preset to see its full configuration, then click Use this preset to open the endpoint deployment form with the preset's values pre-filled.

Creating a preset

Click Create preset to define a new endpoint configuration. You'll specify:

FieldRequiredWhat it does
NamerequiredUnique identifier for the preset.
DescriptionoptionalWhat this preset is for.
CPUrequiredCPU allocation (e.g. 2, 4000m).
MemoryrequiredMemory allocation (e.g. 8Gi).
GPU countoptionalNumber of GPUs. Zero for CPU-only.
Compute pooloptionalTarget compute pool.
Min replicasrequiredMinimum pod count. Set to 0 for scale-to-zero.
Max replicasrequiredMaximum pod count.
RuntimesoptionalWhich inference runtimes this preset supports.
ConfigurationsoptionalPod defaults to attach (env vars, secrets, init scripts).

Next steps

Ask AI
Ask a question about Vantage Compute...