AI: post transformers

DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis


Listen Later

This reviews a document dated January 27, 2025, from Daniel and Michael at Unsloth, details their work on quantizing DeepSeek-R1's 671B parameter model, significantly reducing its size by 80% to 131GB while maintaining functionality. They achieved this dynamic quantization by selectively applying higher bitrates to crucial layers and lower bitrates to less sensitive MoE layers, contrasting with naive quantization methods that render the model unusable. The text explains how to run these quantized versions, discussing hardware requirements, performance benchmarks, and chat template considerations. It also offers a guide for local execution on various systems, including specific instructions for GPU and Apple devices, and outlines the use of Ollama/Open WebUI


Source: https://unsloth.ai/blog/deepseekr1-dynamic

...more
View all episodesView all episodes
Download on the App Store

AI: post transformersBy mcgrof