Best AI papers explained

NoWag: Unified Compression for Large Language Models


Listen Later

We discuss NoWag, a novel framework for compressing large language models (LLMs) while preserving their structure. This unified approach, encompassing both pruning (removing less important connections) and vector quantization (grouping and reducing the precision of weights), uses a normalization technique guided by weight and activation data. Experiments on Llama models demonstrate that NoWag significantly outperforms existing state-of-the-art zero-shot quantization methods with less data and achieves competitive results in pruning, suggesting a shared underlying principle for effective LLM compression.

...more
View all episodesView all episodes
Download on the App Store

Best AI papers explainedBy Enoch H. Kang