
Sign up to save your podcasts
Or


Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism
By Pierce Freeman & Richard Diehl MartinezKimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism