
Sign up to save your podcasts
Or


Training frontier models isn’t as simple as adding more GPUs—one small problem and the whole coordinated dance falls apart. OpenAI’s Mark Handley and Greg Steinbrecher discuss how a new supercomputer network design, used to train some of the company’s latest models, keeps the whole system moving in lockstep, even with record numbers of GPUs. They break down Multipath Reliable Connection, a new protocol OpenAI developed with AMD, Broadcom, Intel, Microsoft, and Nvidia, and why they’re making it available for the whole industry to use.
Chapters
00:00 Intro
00:39 Greg and Mark's paths to OpenAI
04:34 Why training AI stresses networks differently
10:05 Bottlenecks, failures, and the cost of waiting
15:19 How Multipath Reliable Connection works
18:59 A protocol to route around failures
25:05 Why OpenAI is making MRC an open standard
35:09 Could AI compute move to space?
Hosted on Acast. See acast.com/privacy for more information.
By OpenAI4.4
5858 ratings
Training frontier models isn’t as simple as adding more GPUs—one small problem and the whole coordinated dance falls apart. OpenAI’s Mark Handley and Greg Steinbrecher discuss how a new supercomputer network design, used to train some of the company’s latest models, keeps the whole system moving in lockstep, even with record numbers of GPUs. They break down Multipath Reliable Connection, a new protocol OpenAI developed with AMD, Broadcom, Intel, Microsoft, and Nvidia, and why they’re making it available for the whole industry to use.
Chapters
00:00 Intro
00:39 Greg and Mark's paths to OpenAI
04:34 Why training AI stresses networks differently
10:05 Bottlenecks, failures, and the cost of waiting
15:19 How Multipath Reliable Connection works
18:59 A protocol to route around failures
25:05 Why OpenAI is making MRC an open standard
35:09 Could AI compute move to space?
Hosted on Acast. See acast.com/privacy for more information.

1,105 Listeners

347 Listeners

227 Listeners

211 Listeners

319 Listeners

98 Listeners

559 Listeners

508 Listeners

146 Listeners

101 Listeners

226 Listeners

693 Listeners

54 Listeners

32 Listeners

154 Listeners