Create a pipeline
Pipelines are in early access. UI authoring is not yet available. Use the SDK to create pipelines.
Pipelines are multi-step DAGs (directed acyclic graphs) that turn one-off scripts into reproducible, schedulable workflows. Each step runs in its own container with explicit inputs, outputs, and dependencies.
Prerequisites
- A Kubernetes cluster in ready status with at least one compute pool
- The Vantage SDK installed and configured
Build a pipeline with the SDK
Pipeline creation uses the Vantage SDK. The UI provides a read-only view for monitoring runs and inspecting results.
Step 1: Define your pipeline
Write a Python function for each step. Each function becomes a containerized task in the DAG.
from vantage_sdk.pipelines import pipeline, step
@step(image="python:3.11", cpu="2", memory="4Gi")
def preprocess(data_path: str) -> str:
# load and clean data
output_path = "/tmp/cleaned.csv"
return output_path
@step(image="pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime", gpu=1)
def train(data_path: str) -> str:
# train model
model_path = "/tmp/model.pt"
return model_path
@step(image="python:3.11")
def evaluate(model_path: str) -> dict:
# evaluate model
return {"accuracy": 0.95}
Step 2: Compose the DAG
Chain steps together using the @pipeline decorator. Outputs flow from one step to the next.
@pipeline(name="training-pipeline", description="End-to-end training")
def training_pipeline(data_path: str):
cleaned = preprocess(data_path=data_path)
model = train(data_path=cleaned)
metrics = evaluate(model_path=model)
return metrics
Step 3: Upload and run
from vantage_sdk import VantageClient
client = VantageClient()
run = client.pipelines.create_and_run(
training_pipeline,
params={"data_path": "/data/raw/dataset.csv"},
experiment="my-experiment",
)
print(f"Run started: {run.id}")
View your pipeline in the UI
- Click Workbench in the left sidebar, then click Pipelines under the Train section.
- Click your pipeline name to see its runs and version history.
- Click a run to view the DAG visualization, per-step logs, parameters, and output metrics.
Each step in the DAG is color-coded by status: green (succeeded), blue (running), red (failed), or grey (skipped).
Related
- Run a pipeline for managing existing runs
- Schedule recurring runs for cron-style automation
- Pipeline anatomy for conceptual details