
Sign up to save your podcasts
Or


First:
- Apologies for the audio! We had a production error…
What’s new:
- DeepSeek has created breakthroughs in both: How AI systems are trained (making it much more affordable) and how they run in real-world use (making them faster and more efficient)
Details
- FP8 Training: Working With Less Precise Numbers
- Traditional AI training requires extremely precise numbers
- DeepSeek found you can use less precise numbers (like rounding $10.857643 to $10.86)
- Cut memory and computation needs significantly with minimal impact
- Like teaching someone math using rounded numbers instead of carrying every decimal place
- Learning from Other AIs (Distillation)
- Traditional approach: AI learns everything from scratch by studying massive amounts of data
- DeepSeek's approach: Use existing AI models as teachers
- Like having experienced programmers mentor new developers:
- Trial & Error Learning (for their R1 model)
- Started with some basic "tutoring" from advanced models
- Then let it practice solving problems on its own
- When it found good solutions, these were fed back into training
- Led to "Aha moments" where R1 discovered better ways to solve problems
- Finally, polished its ability to explain its thinking clearly to humans
- Smart Team Management (Mixture of Experts)
- Instead of one massive system that does everything, built a team of specialists
- Like running a software company with:
- 256 specialists who focus on different areas
- 1 generalist who helps with everything
- Smart project manager who assigns work efficiently
- For each task, only need 8 specialists plus the generalist
- More efficient than having everyone work on everything
- Efficient Memory Management (Multi-head Latent Attention)
- Traditional AI is like keeping complete transcripts of every conversation
- DeepSeek's approach is like taking smart meeting minutes
- Captures key information in compressed format
- Similar to how JPEG compresses images
- Looking Ahead (Multi-Token Prediction)
- Traditional AI reads one word at a time
- DeepSeek looks ahead and predicts two words at once
- Like a skilled reader who can read ahead while maintaining comprehension
Why This Matters
- Cost Revolution: Training costs of $5.6M (vs hundreds of millions) suggests a future where AI development isn't limited to tech giants.
- Working Around Constraints: Shows how limitations can drive innovation—DeepSeek achieved state-of-the-art results without access to the most powerful chips (at least that’s the best conclusion at the moment).
What’s Interesting
- Efficiency vs Power: Challenges the assumption that advancing AI requires ever-increasing computing power - sometimes smarter engineering beats raw force.
- Self-Teaching AI: R1's ability to develop reasoning capabilities through pure reinforcement learning suggests AIs can discover problem-solving methods on their own.
- AI Teaching AI: The success of distillation shows how knowledge can be transferred between AI models, potentially leading to compounding improvements over time.
- IP for Free: If DeepSeek can be such a fast follower through distillation, what’s the advantage of OpenAI, Google, or another company to release a novel model?
By Helen and Dave Edwards5
99 ratings
First:
- Apologies for the audio! We had a production error…
What’s new:
- DeepSeek has created breakthroughs in both: How AI systems are trained (making it much more affordable) and how they run in real-world use (making them faster and more efficient)
Details
- FP8 Training: Working With Less Precise Numbers
- Traditional AI training requires extremely precise numbers
- DeepSeek found you can use less precise numbers (like rounding $10.857643 to $10.86)
- Cut memory and computation needs significantly with minimal impact
- Like teaching someone math using rounded numbers instead of carrying every decimal place
- Learning from Other AIs (Distillation)
- Traditional approach: AI learns everything from scratch by studying massive amounts of data
- DeepSeek's approach: Use existing AI models as teachers
- Like having experienced programmers mentor new developers:
- Trial & Error Learning (for their R1 model)
- Started with some basic "tutoring" from advanced models
- Then let it practice solving problems on its own
- When it found good solutions, these were fed back into training
- Led to "Aha moments" where R1 discovered better ways to solve problems
- Finally, polished its ability to explain its thinking clearly to humans
- Smart Team Management (Mixture of Experts)
- Instead of one massive system that does everything, built a team of specialists
- Like running a software company with:
- 256 specialists who focus on different areas
- 1 generalist who helps with everything
- Smart project manager who assigns work efficiently
- For each task, only need 8 specialists plus the generalist
- More efficient than having everyone work on everything
- Efficient Memory Management (Multi-head Latent Attention)
- Traditional AI is like keeping complete transcripts of every conversation
- DeepSeek's approach is like taking smart meeting minutes
- Captures key information in compressed format
- Similar to how JPEG compresses images
- Looking Ahead (Multi-Token Prediction)
- Traditional AI reads one word at a time
- DeepSeek looks ahead and predicts two words at once
- Like a skilled reader who can read ahead while maintaining comprehension
Why This Matters
- Cost Revolution: Training costs of $5.6M (vs hundreds of millions) suggests a future where AI development isn't limited to tech giants.
- Working Around Constraints: Shows how limitations can drive innovation—DeepSeek achieved state-of-the-art results without access to the most powerful chips (at least that’s the best conclusion at the moment).
What’s Interesting
- Efficiency vs Power: Challenges the assumption that advancing AI requires ever-increasing computing power - sometimes smarter engineering beats raw force.
- Self-Teaching AI: R1's ability to develop reasoning capabilities through pure reinforcement learning suggests AIs can discover problem-solving methods on their own.
- AI Teaching AI: The success of distillation shows how knowledge can be transferred between AI models, potentially leading to compounding improvements over time.
- IP for Free: If DeepSeek can be such a fast follower through distillation, what’s the advantage of OpenAI, Google, or another company to release a novel model?

43,816 Listeners

43,582 Listeners

15,212 Listeners

4,166 Listeners

9,182 Listeners

10,727 Listeners

532 Listeners

803 Listeners

10,215 Listeners

4,182 Listeners

2,548 Listeners

571 Listeners

511 Listeners

343 Listeners

488 Listeners