KubeFM

Saving 10s of thousands of dollars deploying AI at scale with Kubernetes, with John McBride


Listen Later

Curious about running AI models on Kubernetes without breaking the bank? This episode delivers practical insights from someone who's done it successfully at scale.

John McBride, VP of Infrastructure and AI Engineering at the Linux Foundation shares how his team at OpenSauced built StarSearch, an AI feature that uses natural language processing to analyze GitHub contributions and provide insights through semantic queries. By using open-source models instead of commercial APIs, the team saved tens of thousands of dollars.

You will learn:

  • How to deploy VLLM on Kubernetes to serve open-source LLMs like Mistral and Llama, including configuration challenges with GPU drivers and daemon sets

  • Why smaller models (7-14B parameters) can achieve 95% effectiveness for many tasks compared to larger commercial models, with proper prompt engineering

  • How running inference workloads on your own infrastructure with T4 GPUs can reduce costs from tens of thousands to just a couple thousand dollars monthly

  • Practical approaches to monitoring GPU workloads in production, including handling unpredictable failures and VRAM consumption issues

Sponsor

This episode is brought to you by StackGen! Don't let infrastructure block your teams. StackGen deterministically generates secure cloud infrastructure from any input - existing cloud environments, IaC or application code.

More info

  • Find all the links and info for this episode here: https://ku.bz/wP6bTlrFs

  • Interested in sponsoring an episode? Learn more.

View all episodesView all episodes
Download on the App Store

KubeFMBy KubeFM

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like KubeFM

View all
Security Now (Audio) by TWiT

Security Now (Audio)

1,952 Listeners

Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

265 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

285 Listeners

The Cloudcast by Massive Studios

The Cloudcast

155 Listeners

LINUX Unplugged by Jupiter Broadcasting

LINUX Unplugged

258 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

43 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

580 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

Soft Skills Engineering by Jamison Dance and Dave Smith

Soft Skills Engineering

271 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

Late Night Linux by The Late Night Linux Family

Late Night Linux

153 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

184 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

62 Listeners

2.5 Admins by The Late Night Linux Family

2.5 Admins

89 Listeners

Oxide and Friends by Oxide Computer Company

Oxide and Friends

48 Listeners