Predictive vs LLM
Two endpoint kinds, two tuning surfaces.
Workbench distinguishes two endpoint kinds, because their tuning surfaces are different:
| Kind | Best for | Knobs |
|---|---|---|
| Predictive | Sklearn / XGBoost / classical PyTorch / TF models. Single-input, fast inference. | Framework, runtime, protocol version (v1 / v2 / openai), shared memory. |
| LLM | Generative models — chat, embeddings, completion. Long contexts, batching matters. | Tensor / pipeline / data parallelism, request batching, cache-aware routing. |