Integrations
Integrations are optional platform tools that vdeployer-web installs on your Kubernetes cluster after it reaches ready. You select them during cluster creation and can enable or disable them later from the cluster detail page.
Available integrations
| Integration | Purpose | Default |
|---|---|---|
| Notebook | JupyterHub for interactive sessions | Enabled |
| Grafana + Prometheus | Cluster monitoring and observability | Enabled |
| Ray | Distributed ML training framework | Disabled |
| MLflow | ML experiment tracking | Disabled |
| Slurm on Kubernetes | Deploy Slurm on this cluster | Disabled |
Notebook (JupyterHub)
JupyterHub provides interactive notebook sessions for your team. When enabled, the Workbench tab becomes available in the Vantage sidebar, where users can launch JupyterLab, VS Code, or RStudio sessions.
- Deployed automatically when the cluster reaches
ready. - Authenticated through Vantage's Keycloak integration — no separate JupyterHub login required.
- Sessions are pinned to the cluster's compute profiles and storage.
Grafana + Prometheus
Grafana with Prometheus provides cluster-wide monitoring:
- Pre-configured dashboards for cluster utilization, node-level metrics, and cost tracking.
- Accessible from the Monitoring tab on the cluster detail page.
- Includes alerts for node failures, disk pressure, and cluster autoscaler events.
Grafana is enabled by default and is deployed every time a new cluster is created.
Ray
Ray is a distributed computing framework for ML training and serving. When enabled:
- A Ray cluster is deployed on your Kubernetes cluster.
- Ray is available for Workbench sessions — import
rayin your notebook and connect to the Ray cluster. - Supports distributed training, hyperparameter tuning, and model serving.
Enable Ray if you plan to run distributed ML training jobs across multiple nodes.
MLflow
MLflow is an ML experiment tracking platform. When enabled:
- An MLflow tracking server is deployed on the cluster.
- Training runs are automatically logged to MLflow for comparison and reproducibility.
- Access MLflow through the Vantage UI or its native API.
Enable MLflow if you need experiment tracking across training runs. Combine with Ray for distributed training + tracking.
Slurm on Kubernetes
Enabling this integration allows you to deploy a Slurm scheduler on top of the Kubernetes cluster. The cluster appears as a selectable parent when creating a Slurm-on-Kubernetes cluster.
- Not a workload itself — it's a capability flag that enables other cluster types.
- Slurm-on-K8s deployments are created separately through the cluster creation wizard.
- Only supported on AWS and Cudo Compute parent clusters.
For the full walkthrough, see Slurm on Kubernetes.
Managing integrations post-creation
- Open the cluster detail page and go to the Integrations tab.
- Toggle integrations on or off.
- vdeployer-web applies the changes. Toggling an integration off removes the deployed components; toggling it on installs them.
Some integrations (Ray, JupyterHub) may take a few minutes to become available after toggling.