Skip to main content

Integrations

Platform tools that deploy onto your Kubernetes cluster — JupyterHub, Grafana, Ray, MLflow, and Slurm on Kubernetes.

Integrations

Integrations are optional platform tools that vdeployer-web installs on your Kubernetes cluster after it reaches ready. You select them during cluster creation and can enable or disable them later from the cluster detail page.

Available integrations

IntegrationPurposeDefault
NotebookJupyterHub for interactive sessionsEnabled
Grafana + PrometheusCluster monitoring and observabilityEnabled
RayDistributed ML training frameworkDisabled
MLflowML experiment trackingDisabled
Slurm on KubernetesDeploy Slurm on this clusterDisabled

Notebook (JupyterHub)

JupyterHub provides interactive notebook sessions for your team. When enabled, the Workbench tab becomes available in the Vantage sidebar, where users can launch JupyterLab, VS Code, or RStudio sessions.

  • Deployed automatically when the cluster reaches ready.
  • Authenticated through Vantage's Keycloak integration — no separate JupyterHub login required.
  • Sessions are pinned to the cluster's compute profiles and storage.

Grafana + Prometheus

Grafana with Prometheus provides cluster-wide monitoring:

  • Pre-configured dashboards for cluster utilization, node-level metrics, and cost tracking.
  • Accessible from the Monitoring tab on the cluster detail page.
  • Includes alerts for node failures, disk pressure, and cluster autoscaler events.

Grafana is enabled by default and is deployed every time a new cluster is created.

Ray

Ray is a distributed computing framework for ML training and serving. When enabled:

  • A Ray cluster is deployed on your Kubernetes cluster.
  • Ray is available for Workbench sessions — import ray in your notebook and connect to the Ray cluster.
  • Supports distributed training, hyperparameter tuning, and model serving.

Enable Ray if you plan to run distributed ML training jobs across multiple nodes.

MLflow

MLflow is an ML experiment tracking platform. When enabled:

  • An MLflow tracking server is deployed on the cluster.
  • Training runs are automatically logged to MLflow for comparison and reproducibility.
  • Access MLflow through the Vantage UI or its native API.

Enable MLflow if you need experiment tracking across training runs. Combine with Ray for distributed training + tracking.

Slurm on Kubernetes

Enabling this integration allows you to deploy a Slurm scheduler on top of the Kubernetes cluster. The cluster appears as a selectable parent when creating a Slurm-on-Kubernetes cluster.

  • Not a workload itself — it's a capability flag that enables other cluster types.
  • Slurm-on-K8s deployments are created separately through the cluster creation wizard.
  • Only supported on AWS and Cudo Compute parent clusters.

For the full walkthrough, see Slurm on Kubernetes.

Managing integrations post-creation

  1. Open the cluster detail page and go to the Integrations tab.
  2. Toggle integrations on or off.
  3. vdeployer-web applies the changes. Toggling an integration off removes the deployed components; toggling it on installs them.

Some integrations (Ray, JupyterHub) may take a few minutes to become available after toggling.

Ask AI
Ask a question about Vantage Compute...