AI Post Transformers

Tempo: SLO-Aware LLM Serving Maximizing Service Gain


Listen Later

The April 24, 2025 academic paper introduces Tempo, a novel scheduling system designed to optimize Large Language Model (LLM) serving by addressing the wide variety of Service Level Objectives (SLOs) in modern LLM applications. The authors categorize requests into three types—latency-sensitive, throughput-intensive, and collective requests—each with distinct performance requirements that existing schedulers fail to manage effectively. Tempo maximizes "service gain" by allocating just enough serving bandwidth to meet each request’s SLO, utilizing a hybrid scheduling strategy that relies on lightweight prediction models for conservative initial estimates of response length and dependency-graph matching for complex workflows. Evaluations demonstrate that Tempo significantly outperforms state-of-the-art systems in terms of both service gain and SLO goodput across diverse workloads and models. Source: April 24, 2025 Tempo: Application-aware LLM Serving with Mixed SLO Requirements https://arxiv.org/pdf/2504.20068
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof