On technical podcasts we often talk in generalities. Most of the time, X design approach should work. But what if you don’t fall into that “most of the time” scenario? What if your design constraints mean you have different problems than the average compute infrastructure?
Today, we have an outside-the-box design discussion with Julian Firminger, who comes to us from the media and entertainment industry. Scaling out is not a new problem for these folks, and we’re going to dive into their challenges and how they have to think differently.
In particular, we’ll look at challenges Julian faces around CPU, storage, and networking, and the design choices he and his team have made.
If you’re interested in learning more about technical infrastructure for the entertainment industry, check out StudioSysAdmins.com.
Sponsor: Interop ITX
Interop ITX, May 15 – 19 in Las Vegas, is the only independent conference for technology leaders. Get a year’s worth of objective IT education in one week. And don’t miss the Packet Pushers’ Future Of Networking Summit at Interop. Visit interopitx.com and use promo code PacketPushers for a 20% discount.
Show Notes:
Part 1 – CPU Scaling Limitations
* Why is there such demand for CPU in M&E?
* Does multi-core CPU help?
* What about special-purpose CPUs?
* GPUs?
* Hypervisor or no, and why?
Part 2 – Scale-Out Storage For Adults
* Describe M&E requirements for storage
* Ingest/Acquisition
* Post Production and playout
* Archive
* Do you believe scale-out storage means coupling disk with compute and striping across it, or something else?
* What is your storage upgrade strategy?
* For capacity
* For performance
* Do you have a disaster recovery scheme?
* How about a backup scheme?
* How impactful is latency between compute & data on M&E workloads?
* What does this mean for placing workloads in the public cloud?
Part 3 – Network Design
* IP switching
* For traditional scale-out, we normally think of 3, 5, or even 7 layer Clos fabrics (leaf-spine), matched to access-port density requirements.
* For performance, we limit or even eliminate oversubscription between tiers. How does this work out for M&E?
* Leaf-spine relies heavily on ECMP to scale between layers, but evenly loading links is hard. Thoughts?
* You’ve mentioned a toroid as a better fit for M&E than leaf-spine.
* What is a toroid topology?
* Explain the data patterns that make this desirable for M&E?