April 01, 2024

Ep 21: LLM Model Merging

32 minutes

AI News:

1. Databricks announces the release of a new LLM (Large Language Model) named DBRX, designed to improve copilot features in DBX.

2. The research paper "LLM4Decompile: Decompiling Binary Code with Large Language Models" [2403.05286] is released, claiming high-level accuracy in decompiling machine code to high-level language.

3. GitHub introduces an AI tool for detecting security vulnerabilities within code.

4. Stability AI experiences a CEO resignation.

Main Topic: Merging LLM Models

1. SLERP (Smoothly Large Embeddings for Representation Pooling)

2. TIES (Technically Intuitive Embedding Space)

3. Frankenmerges (A new approach to merging models)

References:

[2403.05286] LLM4Decompile: Decompiling Binary Code with Large Language Models

[2311.03099] Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

[2306.11644] Textbooks Are All You Need

[1909.11299] Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

[1511.07543] Convergent Learning: Do different neural networks learn the same representations?

An empirical analysis of compute-optimal large language model training - Google DeepMind

...more

View all episodes

By Saugata Chatterjee

April 01, 2024

Ep 21: LLM Model Merging

32 minutes

AI News:

1. Databricks announces the release of a new LLM (Large Language Model) named DBRX, designed to improve copilot features in DBX.

2. The research paper "LLM4Decompile: Decompiling Binary Code with Large Language Models" [2403.05286] is released, claiming high-level accuracy in decompiling machine code to high-level language.

3. GitHub introduces an AI tool for detecting security vulnerabilities within code.

4. Stability AI experiences a CEO resignation.

Main Topic: Merging LLM Models

1. SLERP (Smoothly Large Embeddings for Representation Pooling)

2. TIES (Technically Intuitive Embedding Space)

3. Frankenmerges (A new approach to merging models)

References:

[2403.05286] LLM4Decompile: Decompiling Binary Code with Large Language Models

[2311.03099] Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

[2306.11644] Textbooks Are All You Need

[1909.11299] Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

[1511.07543] Convergent Learning: Do different neural networks learn the same representations?

An empirical analysis of compute-optimal large language model training - Google DeepMind

...more

Share Ep 21: LLM Model Merging

Sign up to save your podcasts

Ep 21: LLM Model Merging

Ep 21: LLM Model Merging