August 08, 2025

DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis

17 minutes

This reviews a document dated January 27, 2025, from Daniel and Michael at Unsloth, details their work on quantizing DeepSeek-R1's 671B parameter model, significantly reducing its size by 80% to 131GB while maintaining functionality. They achieved this dynamic quantization by selectively applying higher bitrates to crucial layers and lower bitrates to less sensitive MoE layers, contrasting with naive quantization methods that render the model unusable. The text explains how to run these quantized versions, discussing hardware requirements, performance benchmarks, and chat template considerations. It also offers a guide for local execution on various systems, including specific instructions for GPU and Apple devices, and outlines the use of Ollama/Open WebUI

Source: https://unsloth.ai/blog/deepseekr1-dynamic

...more

View all episodes

By mcgrof

August 08, 2025

DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis

17 minutes

Source: https://unsloth.ai/blog/deepseekr1-dynamic

...more

Share DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis

Sign up to save your podcasts

DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis

DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis