The April 24, 2025 academic paper introduces Tempo, a novel scheduling system designed to optimize Large Language Model (LLM) serving by addressing the wide variety of Service Level Objectives (SLOs) in modern LLM applications. The authors categorize requests into three types—latency-sensitive, throughput-intensive, and collective requests—each with distinct performance requirements that existing schedulers fail to manage effectively. Tempo maximizes "service gain" by allocating just enough serving bandwidth to meet each request’s SLO, utilizing a hybrid scheduling strategy that relies on lightweight prediction models for conservative initial estimates of response length and dependency-graph matching for complex workflows. Evaluations demonstrate that Tempo significantly outperforms state-of-the-art systems in terms of both service gain and SLO goodput across diverse workloads and models. Source: April 24, 2025 Tempo: Application-aware LLM Serving with Mixed SLO Requirements https://arxiv.org/pdf/2504.20068