Skip to main content

Reference

Every KPI definition and time-range option.

Reference

KPI definitions

KPIWhat it measuresWhy it matters
Active sessionsCount of running notebook sessions, with delta vs yesterday.Tracks interactive compute consumption and spot unexpected overnight sessions.
Active endpointsCount of live inference endpoints, with warm replicas called out.Warm replicas cost money even with no traffic; cold ones have latency on first hit.
Running training jobsCount of in-progress training jobs + total GPU-hours consumed today.Gives a real-time view of your cluster's training workload.
Spend todayBurn rate vs your daily cost ceiling.Early warning before you breach a budget threshold.
Spend MTDMonth-to-date spend with an end-of-month forecast.Capacity planning and finance reporting.
Idle GPU hoursPaid-for GPU time where utilization stayed under 5%.The most actionable waste metric — drive this toward zero.

Time ranges

RangeUse case
last 1hCatching active incidents in real time.
last 24hDaily standup, idle-GPU triage.
last 7dSpend forecast vs ceiling.
last 30dCapacity planning, MTD spend tracking.
⌘I