
Sign up to save your podcasts
Or


Nicolay here,most AI conversations focus on training bigger models with more compute. This one explores the counterintuitive world where averaging weights from different models creates better performance than expensive post-training.
Today I have the chance to talk to Maxime Labonne, who's a researcher at Liquid AI and the architect of some of the most popular open source models on Hugging Face.
He went from researching neural networks for cybersecurity to building "Frankenstein models" through techniques that shouldn't work but consistently do.
Key Insight: Model Merging as a Free LunchThe core breakthrough is deceptively simple: take two fine-tuned models, average their weights layer by layer, and often get better performance than either individual model. Maxime initially started writing an article to explain why this couldn't work, but his own experiments convinced him otherwise.
The magic lies in knowledge compression and regularization. When you train a model multiple times on similar data, each run creates slightly different weight configurations due to training noise. Averaging these weights creates a smoother optimization path that avoids local minima. You can literally run model merging on a CPU - no GPUs required.
In the podcast, we also touch on:
๐ก Core Concepts
๐ถ Connect with Maxime:
๐ถ Connect with Nicolay:
โฑ Important Moments
๐ Tools & Tech Mentioned
๐ Recommended Resources
By Nicolay GeroldNicolay here,most AI conversations focus on training bigger models with more compute. This one explores the counterintuitive world where averaging weights from different models creates better performance than expensive post-training.
Today I have the chance to talk to Maxime Labonne, who's a researcher at Liquid AI and the architect of some of the most popular open source models on Hugging Face.
He went from researching neural networks for cybersecurity to building "Frankenstein models" through techniques that shouldn't work but consistently do.
Key Insight: Model Merging as a Free LunchThe core breakthrough is deceptively simple: take two fine-tuned models, average their weights layer by layer, and often get better performance than either individual model. Maxime initially started writing an article to explain why this couldn't work, but his own experiments convinced him otherwise.
The magic lies in knowledge compression and regularization. When you train a model multiple times on similar data, each run creates slightly different weight configurations due to training noise. Averaging these weights creates a smoother optimization path that avoids local minima. You can literally run model merging on a CPU - no GPUs required.
In the podcast, we also touch on:
๐ก Core Concepts
๐ถ Connect with Maxime:
๐ถ Connect with Nicolay:
โฑ Important Moments
๐ Tools & Tech Mentioned
๐ Recommended Resources