Seventy3

【第175期】TensorLLM:使用多头自注意力提升模型能力


Listen Later

Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。

今天的主题是:TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs

Summary

This research introduces TensorLLM, a novel framework for improving the reasoning abilities and compression of Large Language Models (LLMs) by focusing on the Multi-Head Attention (MHA) block. The method employs multi-head tensorisation and Tucker decomposition to denoise and compress MHA weights by enforcing a shared higher-dimensional subspace across multiple attention heads. Experiments demonstrate that TensorLLM enhances LLM reasoning capabilities across various benchmark datasets and architectures without requiring additional training. The framework can also be combined with existing techniques that denoise the feed-forward network (FFN) layers for further performance gains. The study validates the approach through ablation experiments and comparisons with other compression techniques, showing consistent improvements in accuracy and compression rates. The paper concludes by emphasizing the potential of TensorLLM as a versatile module for improving LLMs and suggesting future work on finding generalizable hyperparameter settings.

本研究提出了 TensorLLM,一种新颖的框架,通过聚焦于多头自注意力(MHA)块来提升大型语言模型(LLM)的推理能力和压缩效率。该方法采用多头张量化Tucker 分解,通过在多个注意力头之间强制共享一个更高维度的子空间,来去噪和压缩 MHA 权重。

实验表明,TensorLLM 在不同的基准数据集和架构上提升了 LLM 的推理能力,而无需额外的训练。该框架还可以与现有的去噪前馈网络(FFN)层的技术结合,进一步提升性能。通过消融实验和与其他压缩技术的比较,研究验证了该方法的有效性,显示出在准确性和压缩率方面的持续改进

论文最后强调了 TensorLLM 作为一种多功能模块,具有提升 LLM 性能的潜力,并提出了未来研究的方向,即寻找可以广泛应用的超参数设置。

原文链接:https://arxiv.org/abs/2501.15674

...more
View all episodesView all episodes
Download on the App Store

Seventy3By 任雨山