April 09, 2025

Scaling Multi-Tenant ML Inference on Kubernetes: Workday's Strategy

20 minutes

Workday's engineering team tackled the challenge of scaling machine learning inference for numerous customers by devising a "bin packed shards" strategy on Kubernetes. This approach, detailed in their Medium article from January 2022, involves grouping multiple tenants' ML models into shared units called shards, aiming for efficient resource usage, particularly memory. Kubernetes handles the deployment and scaling of these shards, while Istio's Virtual Services manage the routing of tenant-specific requests. The strategy offers benefits like cost reduction and independent model management but also presents complexities in initial design and ongoing operation, focusing on a balance between efficiency and manageability.

...more

View all episodes

By Benjamin Alloul 🗪 🅽🅾🆃🅴🅱🅾🅾🅺🅻🅼

April 09, 2025

Scaling Multi-Tenant ML Inference on Kubernetes: Workday's Strategy

20 minutes

...more

Share Scaling Multi-Tenant ML Inference on Kubernetes: Workday's Strategy

Sign up to save your podcasts

Scaling Multi-Tenant ML Inference on Kubernetes: Workday's Strategy

Scaling Multi-Tenant ML Inference on Kubernetes: Workday's Strategy