April 07, 2026

Intelligent Kubernetes Load Balancing, with Rohit Agrawal

Listen Later

30 minutes

You're running gRPC services in Kubernetes, load balancing looks fine on the dashboard — but some pods are burning at 80% CPU while others sit idle, and adding more replicas only partially helps.

Rohit Agrawal, a Staff Software Engineer on the traffic platform team at Databricks, explains why this happens and how his team replaced Kubernetes's default networking with a proxy-less, client-side load-balancing system built on the xDS protocol.

In this episode:

Why KubeProxy's Layer 4 routing breaks down under high-throughput gRPC: it picks a backend once per TCP connection, not per request
How Databricks built an Endpoint Discovery Service (EDS) that watches Kubernetes directly and streams real-time pod metadata to every client
How zone-aware spillover cut cross-availability-zone costs without sacrificing availability
Why CPU-based routing failed (monitoring lag creates oscillation) and what signals to use instead

The system has been running in production for three years across hundreds of services, handling millions of requests.

Sponsor

This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/y803JMhBk
Interested in sponsoring an episode? Learn more.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

KubeFM

By KubeFM

5

22 ratings

April 07, 2026

Intelligent Kubernetes Load Balancing, with Rohit Agrawal

Listen Later

30 minutes

You're running gRPC services in Kubernetes, load balancing looks fine on the dashboard — but some pods are burning at 80% CPU while others sit idle, and adding more replicas only partially helps.

Rohit Agrawal, a Staff Software Engineer on the traffic platform team at Databricks, explains why this happens and how his team replaced Kubernetes's default networking with a proxy-less, client-side load-balancing system built on the xDS protocol.

In this episode:

Why KubeProxy's Layer 4 routing breaks down under high-throughput gRPC: it picks a backend once per TCP connection, not per request
How Databricks built an Endpoint Discovery Service (EDS) that watches Kubernetes directly and streams real-time pod metadata to every client
How zone-aware spillover cut cross-availability-zone costs without sacrificing availability
Why CPU-based routing failed (monitoring lag creates oscillation) and what signals to use instead

The system has been running in production for three years across hundreds of services, handling millions of requests.

Sponsor

This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/y803JMhBk
Interested in sponsoring an episode? Learn more.

...more

More shows like KubeFM

Software Engineering Radio - the podcast for professional software developers by team@se-radio.net (SE-Radio Team)

Software Engineering Radio - the podcast for professional software developers

275 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

289 Listeners

Security Now (Audio) by TWiT

Security Now (Audio)

2,009 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

LINUX Unplugged by Jupiter Broadcasting

LINUX Unplugged

274 Listeners

The Enterprise AI Show by Massive Studios

The Enterprise AI Show

149 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

583 Listeners

Soft Skills Engineering by Jamison Dance and Dave Smith

Soft Skills Engineering

288 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

43 Listeners

Late Night Linux by The Late Night Linux Family

Late Night Linux

170 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

181 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

62 Listeners

2.5 Admins by The Late Night Linux Family

2.5 Admins

98 Listeners

Oxide and Friends by Oxide Computer Company

Oxide and Friends

65 Listeners