January 28, 2026

Ethernet Based AI Cluster Fabric - Performance Improvement - Tuning in SONiC | OCP Dublin 2025

19 minutes

I’ll rewrite your session description to match the same structure and tone as the example: short intro, “Learn how” value line, punchy bullet takeaways, and a timestamp-style outline, ending with the same contact/social footer.

Recorded live at the OCP Regional Summit Dublin 2025, this episode features Nanda Ravindran (VP of Technical Sales, Edgecore Networks) sharing hands-on, real-world insights into tuning AI-scale network fabrics with SONiC.

Learn how Edgecore benchmarks and optimizes 800G AI switches in SONiC — and why consistent, repeatable tuning (plus validation under realistic load) is critical for stable AI network performance.

AI workload characteristics and the fabric performance challenges they introduce
Step-by-step SONiC tuning: PFC, ECN, and DLB configuration fundamentals
Using Spirent test equipment to generate realistic AI traffic profiles and stress conditions
What changes performance: topology choices, link failures, VXLAN overlays, and traffic patterns
Flowlet mode vs. hash mode — which delivers better outcomes for AI use cases
Why automation, repeatable test methods, and community best practices matter at AI scale
Edgecore’s open networking approach: collaborating with Broadcom on Enterprise SONiC for next-gen AI deployments

Session outline:
00:00 Intro — Nanda Ravindran & session overview
01:00 Why AI fabric tuning matters — 800G benchmarking + recurring performance gaps
02:00 AI workload traits — elephant flows, low entropy, load-balancing pressure; goal: lossless + low latency
03:00 SONiC tuning focus — RoCEv2 mapping + PFC, ECN, DLB
04:00 Testbed overview — 6× Edgecore 800G (TH5), SONiC 202311-based, non-blocking fabric
05:00 Spirent methodology — AI workload emulation, collectives, measurements
06:00 PFC configuration — QoS profiles (DSCP→TC→Queue/PG), bindings, enablement
08:00 ECN configuration — WRED profile, thresholds, drop probability sweeps
09:00 DLB explained — hash vs flowlet; why flowlet tuning matters
10:00 Key findings — PFC-only best in lab; PFC+ECN required for deployments
12:00 ECN result highlight — example best setting (1% drop, 2MB/10MB thresholds)
13:00 800G vs 400G/breakout — native 800G performs better for AI workloads
14:00 Failure + VXLAN tests — link failures hurt; VXLAN shows minimal impact
15:00 Collectives + PXN — PXN best; flowlet recovers faster than hash
16:00 Call to action — automation + repeatable community best practices
18:00 Q&A — question on newer enhanced DLB/ECMP; plan to test on newer SONiC

📬 Questions or support: [email protected] | 🌐 www.stordis.com

Let’s get social
💻 Blog: https://stordis.com/blog/
📘 Facebook: https://www.facebook.com/people/STORDIS-GmbH/100057058555819/
📸 Instagram: https://www.instagram.com/stordis_open_networking/
👥 LinkedIn: https://www.linkedin.com/company/stordis/
🐦 X: https://twitter.com/STORDIS_GmbH/

#SONiC #AIFabricTuning #Edgecore #800GSwitches #OCPDublin2025 #ECN #PFC #DLB #AIWorkloads #SONiCOptimization #OpenNetworking #EnterpriseSONiC #Broadcom #FlowletMode #NetworkAutomation #AIInfrastructure

...more

View all episodes

By STORDIS GmbH

January 28, 2026

Ethernet Based AI Cluster Fabric - Performance Improvement - Tuning in SONiC | OCP Dublin 2025

19 minutes

AI workload characteristics and the fabric performance challenges they introduce
Step-by-step SONiC tuning: PFC, ECN, and DLB configuration fundamentals
Using Spirent test equipment to generate realistic AI traffic profiles and stress conditions
What changes performance: topology choices, link failures, VXLAN overlays, and traffic patterns
Flowlet mode vs. hash mode — which delivers better outcomes for AI use cases
Why automation, repeatable test methods, and community best practices matter at AI scale
Edgecore’s open networking approach: collaborating with Broadcom on Enterprise SONiC for next-gen AI deployments

📬 Questions or support: [email protected] | 🌐 www.stordis.com

...more

Share Ethernet Based AI Cluster Fabric - Performance Improvement - Tuning in SONiC | OCP Dublin 2025

Sign up to save your podcasts

Ethernet Based AI Cluster Fabric - Performance Improvement - Tuning in SONiC | OCP Dublin 2025

Ethernet Based AI Cluster Fabric - Performance Improvement - Tuning in SONiC | OCP Dublin 2025