Share Tempo: SLO-Aware LLM Serving Maximizing Service Gain

Copy link

November 10, 2025

Tempo: SLO-Aware LLM Serving Maximizing Service Gain

14 minutes

The April 24, 2025 academic paper introduces **Tempo**, a novel scheduling system designed to optimize Large Language Model (LLM) serving by addressing the wide variety of Service Level Objectives (**SLOs**) in modern LLM applications. The authors categorize requests into three types—**latency-sensitive**, **throughput-intensive**, and **collective requests**—each with distinct performance requirements that existing schedulers fail to manage effectively. Tempo maximizes "service gain" by allocating just enough serving bandwidth to meet each request’s SLO, utilizing a **hybrid scheduling strategy** that relies on lightweight prediction models for conservative initial estimates of response length and **dependency-graph matching** for complex workflows. Evaluations demonstrate that Tempo significantly outperforms state-of-the-art systems in terms of both service gain and **SLO goodput** across diverse workloads and models.

Source:

April 24, 2025

Tempo: Application-aware LLM Serving with Mixed SLO Requirements

https://arxiv.org/pdf/2504.20068

...more