
Sign up to save your podcasts
Or


Stop wasting money on idle GPUs! Directly from the top-tier SOSP '25 conference, researchers from Peking University and Alibaba Group reveal how Aegaeon uses token-level auto-scaling to achieve an astounding 82% GPU resource saving in production.
Paper: https://ennanzhai.github.io/pub/sosp25-aegaeon.pdf
By hogarthian.artStop wasting money on idle GPUs! Directly from the top-tier SOSP '25 conference, researchers from Peking University and Alibaba Group reveal how Aegaeon uses token-level auto-scaling to achieve an astounding 82% GPU resource saving in production.
Paper: https://ennanzhai.github.io/pub/sosp25-aegaeon.pdf