Pretrained

Training a 1 trillion parameter model


Listen Later

Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism

...more
View all episodesView all episodes
Download on the App Store

PretrainedBy Pierce Freeman & Richard Diehl Martinez