November 29, 2023

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

29 minutes

In this paper, we uncover that Language Models (LMs), either encoder- or decoder-based, can obtain new capabilities by assimilating the parameters of homologous models without retraining or GPUs. Typically, new abilities of LMs can be imparted by Supervised Fine-Tuning (SFT), reflected in the disparity between fine-tuned and pre-trained parameters (i.e., delta parameters). We initially observe that by introducing a novel operation called DARE (Drop And REscale), most delta parameters can be directly set to zeros without affecting the capabilities of SFT LMs and larger models can tolerate a higher proportion of discarded parameters. Based on this observation, we further sparsify delta parameters of multiple SFT homologous models with DARE and subsequently merge them into a single model by parameter averaging. We conduct experiments on eight datasets from the GLUE benchmark with BERT and RoBERTa. We also merge WizardLM, WizardMath, and Code Alpaca based on Llama 2. Experimental results show that: (1) The delta parameter value ranges for SFT models are typically small, often within 0.005, and DARE can eliminate 99% of them effortlessly. However, once the models are continuously pre-trained, the value ranges can grow to around 0.03, making DARE impractical. We have also tried to remove fine-tuned instead of delta parameters and find that a 10% reduction can lead to drastically decreased performance (even to 0). This highlights that SFT merely stimulates the abilities via delta parameters rather than injecting new abilities into LMs; (2) DARE can merge multiple task-specific LMs into one LM with diverse abilities. For instance, the merger of WizardLM and WizardMath improves the GSM8K zero-shot accuracy of WizardLM from 2.2 to 66.3, retaining its instruction-following ability while surpassing WizardMath's original 64.2 performance. Codes are available at https://github.com/yule-BUAA/MergeLM.

2023: Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li

https://arxiv.org/pdf/2311.03099v1.pdf

...more

View all episodes

By Rob

3.7

33 ratings

November 29, 2023

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

29 minutes

2023: Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li

https://arxiv.org/pdf/2311.03099v1.pdf

...more

Share Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Sign up to save your podcasts

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch