GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
A podcast discussion about GShard, a module for scaling neural networks using conditional computation and automatic sharding, focusing on its application to multilingual machine translation.
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
A podcast discussion about GShard, a module for scaling neural networks using conditional computation and automatic sharding, focusing on its application to multilingual machine translation.