The Workbench ML stack
Vantage Workbench is a managed ML platform that covers every phase of the model lifecycle — from interactive experimentation through distributed training to production inference. It runs on Kubernetes, built on top of Kubeflow components, but presents them as native Vantage primitives without exposing CRDs, Istio routes, or Kubernetes namespaces.
This page explains how the pieces fit together and why they are organized the way they are.
The three phases
Every ML project moves through three broad phases, and Workbench maps a set of components to each one.
Develop
Development is interactive, exploratory work — writing code, loading data, testing ideas. Workbench provides sessions: JupyterLab, VS Code, or RStudio environments that run as Kubernetes pods with GPU access, persistent storage, and configurable compute profiles.
Sessions start from a preset — a reusable template that bundles the IDE type, base container image, compute sizes, and default storage volumes. Presets standardize what "a development environment" looks like across a team, so every engineer gets the same image, the same libraries, and the same GPU options without configuring them from scratch.
For lighter-weight access, Cloud Shell provides a browser terminal and Remote Desktop provides VNC-based GUI access. Both are cheaper and faster to spin up than a full session.
The develop phase is where most time is spent and where iteration speed matters most. Sessions give you a full environment with the same compute you will eventually train on, so there is no "it worked on my laptop" gap between development and training.
Train
Training is where experiments become reproducible workloads.
Training jobs are the core primitive. A training job is a finite, potentially distributed workload built on a runtime — a pre-packaged framework environment (PyTorch, DeepSpeed, MLX, and others). You choose a runtime, a compute profile, optional initializers for dataset and model, and an output destination. Under the hood, training jobs use Kubeflow Trainer, but the Vantage UI abstracts that away.
Sweeps extend training jobs into hyperparameter search. A sweep runs many trials — each trial is one training job with a specific parameter combination generated by a search algorithm (Bayesian, grid, or random). The sweep tracks the metric you are optimizing across all trials and surfaces the best one. Sweeps answer the question "which hyperparameters produce the best model?" without requiring you to manually launch and compare dozens of runs.
Pipelines are DAG-based workflows that chain multiple steps — data ingestion, preprocessing, training, evaluation, deployment — into a single reproducible execution. Each step is a containerized task with typed inputs and outputs. Pipelines support experiments (logical groupings for comparison), recurring runs (cron-style triggers), and artifact tracking. They are the right abstraction when your training workflow has more than one stage.
The relationship between these components: a pipeline may orchestrate several training jobs. A sweep spawns many training jobs as trials. A standalone training job runs a single focused workload. All three produce model artifacts that feed into the next phase.
Serve
Serving bridges the gap between a trained model and a production API.
The model registry is a versioned catalog of model artifacts. Sources include HuggingFace, training job outputs, or direct uploads. Each model can have multiple versions. The registry is the handoff point — it decouples "who produced this model" from "who is deploying it."
Endpoints serve a registered model behind an authenticated, autoscaling HTTP URL. Workbench distinguishes two endpoint kinds because their tuning surfaces are different:
- Predictive endpoints are optimized for classical ML models with fast, single-input inference (think: classification, regression, object detection).
- LLM endpoints are built for generative models where batching strategy, context length, and parallelism are the primary knobs.
Both kinds support canary rollouts for safe model updates, autoscaling based on traffic, and scale-to-zero when idle to avoid burning GPU hours on unused capacity.
Cross-cutting concerns
Two components span all three phases rather than belonging to a single one.
Compute profiles are reusable resource templates — GPU vendor and count, instance type, autoscaling bounds, cost rate. Sessions, training jobs, and endpoints all reference the same pool of profiles. This means the GPU configuration you develop on is the same one you train and serve on, eliminating resource-mismatch surprises.
Observability links every resource (sessions, training jobs, endpoints) to Grafana dashboards scoped to that resource. Cluster-wide metrics like utilization, spend, idle GPU hours, and live alerts are available on the Observability tab. The three numbers to watch: $/hr (live burn rate), accumulated (total spend since creation), and idle % (compute reserved but not working).
What Workbench replaces
Teams building ML on Kubernetes typically assemble a stack from multiple open-source projects:
| Concern | DIY approach | Workbench equivalent |
|---|---|---|
| Notebooks | Kubeflow Notebooks, JupyterHub | Sessions + Presets |
| Training | Kubeflow Trainer, custom Job YAML | Training Jobs |
| Hyperparameter search | Katib, Optuna | Sweeps |
| Pipelines | Kubeflow Pipelines, Argo Workflows | Pipelines |
| Model registry | MLflow, custom S3 + metadata DB | Models |
| Serving | KServe, Seldon, custom Deployment | Endpoints |
| Monitoring | Prometheus + Grafana + custom dashboards | Observability |
The integration cost of stitching these together — shared auth, consistent resource quotas, unified cost tracking, namespace management — is where most of the engineering effort goes. Workbench absorbs that integration and presents a single surface with consistent concepts (workspaces, compute profiles, presets, lifecycles) across every phase.
Cross-references
- Workbench overview — the full index of Workbench sections and links
- Workbench concepts — the seven mental models behind Workbench (workspace, preset, lifecycle, cost, and others)
- Compute and clusters — the infrastructure layer underneath Workbench
- Jobs and pipelines — how jobs and pipelines work outside of Workbench (Slurm, scripts, templates)