Skip to main content

Create a pipeline

Build a multi-step pipeline from the Vantage SDK and view it in the UI

Create a pipeline

warning

Pipelines are in early access. UI authoring is not yet available. Use the SDK to create pipelines.

Pipelines are multi-step DAGs (directed acyclic graphs) that turn one-off scripts into reproducible, schedulable workflows. Each step runs in its own container with explicit inputs, outputs, and dependencies.

Prerequisites

Build a pipeline with the SDK

Pipeline creation uses the Vantage SDK. The UI provides a read-only view for monitoring runs and inspecting results.

Step 1: Define your pipeline

Write a Python function for each step. Each function becomes a containerized task in the DAG.

from vantage_sdk.pipelines import pipeline, step

@step(image="python:3.11", cpu="2", memory="4Gi")
def preprocess(data_path: str) -> str:
# load and clean data
output_path = "/tmp/cleaned.csv"
return output_path

@step(image="pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime", gpu=1)
def train(data_path: str) -> str:
# train model
model_path = "/tmp/model.pt"
return model_path

@step(image="python:3.11")
def evaluate(model_path: str) -> dict:
# evaluate model
return {"accuracy": 0.95}

Step 2: Compose the DAG

Chain steps together using the @pipeline decorator. Outputs flow from one step to the next.

@pipeline(name="training-pipeline", description="End-to-end training")
def training_pipeline(data_path: str):
cleaned = preprocess(data_path=data_path)
model = train(data_path=cleaned)
metrics = evaluate(model_path=model)
return metrics

Step 3: Upload and run

from vantage_sdk import VantageClient

client = VantageClient()
run = client.pipelines.create_and_run(
training_pipeline,
params={"data_path": "/data/raw/dataset.csv"},
experiment="my-experiment",
)
print(f"Run started: {run.id}")

View your pipeline in the UI

  1. Click Workbench in the left sidebar, then click Pipelines under the Train section.
  2. Click your pipeline name to see its runs and version history.
  3. Click a run to view the DAG visualization, per-step logs, parameters, and output metrics.

Each step in the DAG is color-coded by status: green (succeeded), blue (running), red (failed), or grey (skipped).

Ask AI
Ask a question about Vantage Compute...