Skip to main content

Runtimes

Inference runtime configurations that define how endpoints serve models.

Runtimes

An inference runtime is a pre-built serving environment: framework + base image + serving strategy. Endpoints reference a runtime to determine how the model is loaded, how requests are handled, and how resources are allocated.

Cluster vs. workspace runtimes

ScopeWho manages itWhere it's available
Cluster runtimePlatform adminEvery workspace on the cluster. Read-only in the UI.
Workspace runtimeWorkspace admin or userOnly in the workspace where it was created. Full CRUD.

The runtimes list

The list view shows every runtime available to your workspace. Each row displays:

  • Name — the runtime's identifier.
  • Framework — serving framework (e.g. Triton, vLLM, KServe).
  • Scope — Cluster or Workspace.
  • Description — what this runtime is for.

Click a runtime to see its full specification, ML policy, and which endpoints are using it.

Creating a workspace runtime

Click Create runtime to define a custom serving environment. You'll specify:

  • Name — a unique identifier for the runtime.
  • Framework — the serving framework.
  • Base image — the container image to use.
  • ML policy — parallelism, resource requirements, and serving configuration.
  • Compute pool — optional target pool.

Next steps

Ask AI
Ask a question about Vantage Compute...