The Quantum Drift

The True Cost of Hosting Open Source Language Models


Listen Later

Ever wondered what it takes to efficiently deploy large language models without breaking the bank? In this episode, Robert and Haley dissect the economics behind hosting open-source LLMs and explore whether established cloud providers like AWS or emerging platforms like Hugging Face Endpoints or BentoML provide the best bang for your buck. Inspired by Ida Silfverskiöld’s in-depth research, we unpack the costs, cold start times, and performance trade-offs of using CPU versus GPU, and on-demand versus serverless setups.

Key Highlights:

  • Platform Comparisons: The trade-offs between AWS, Modal, and other AI-focused platforms.
  • Cost & Efficiency: GPU vs. CPU usage and why it matters in different deployment scenarios.
  • Developer Experience: Ease of deployment and how these platforms cater to developers.

Whether you’re a tech pro or curious about AI's infrastructure, this episode offers a peek into the nuanced world of model hosting economics.

...more
View all episodesView all episodes
Download on the App Store

The Quantum DriftBy Robert Loft and Haley Hanson