KubeFM

Saving 10s of thousands of dollars deploying AI at scale with Kubernetes, with John McBride


Listen Later

Curious about running AI models on Kubernetes without breaking the bank? This episode delivers practical insights from someone who's done it successfully at scale.

John McBride, VP of Infrastructure and AI Engineering at the Linux Foundation shares how his team at OpenSauced built StarSearch, an AI feature that uses natural language processing to analyze GitHub contributions and provide insights through semantic queries. By using open-source models instead of commercial APIs, the team saved tens of thousands of dollars.

You will learn:

  • How to deploy VLLM on Kubernetes to serve open-source LLMs like Mistral and Llama, including configuration challenges with GPU drivers and daemon sets

  • Why smaller models (7-14B parameters) can achieve 95% effectiveness for many tasks compared to larger commercial models, with proper prompt engineering

  • How running inference workloads on your own infrastructure with T4 GPUs can reduce costs from tens of thousands to just a couple thousand dollars monthly

  • Practical approaches to monitoring GPU workloads in production, including handling unpredictable failures and VRAM consumption issues

Sponsor

This episode is brought to you by StackGen! Don't let infrastructure block your teams. StackGen deterministically generates secure cloud infrastructure from any input - existing cloud environments, IaC or application code.

More info

  • Find all the links and info for this episode here: https://ku.bz/wP6bTlrFs

  • Interested in sponsoring an episode? Learn more.

...more
View all episodesView all episodes
Download on the App Store

KubeFMBy KubeFM

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like KubeFM

View all
Security Now (Audio) by TWiT

Security Now (Audio)

1,970 Listeners

Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

273 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

282 Listeners

The Cloudcast by Massive Studios

The Cloudcast

152 Listeners

LINUX Unplugged by Jupiter Broadcasting

LINUX Unplugged

265 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

42 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

591 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

625 Listeners

Soft Skills Engineering by Jamison Dance and Dave Smith

Soft Skills Engineering

270 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

Late Night Linux by The Late Night Linux Family

Late Night Linux

154 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

181 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

64 Listeners

2.5 Admins by The Late Night Linux Family

2.5 Admins

92 Listeners

Oxide and Friends by Oxide Computer Company

Oxide and Friends

47 Listeners