Machine Learning Made Simple

Ep 21: LLM Model Merging


Listen Later

AI News:

1. Databricks announces the release of a new LLM (Large Language Model) named DBRX, designed to improve copilot features in DBX.

2. The research paper "LLM4Decompile: Decompiling Binary Code with Large Language Models" [2403.05286] is released, claiming high-level accuracy in decompiling machine code to high-level language.

3. GitHub introduces an AI tool for detecting security vulnerabilities within code.

4. Stability AI experiences a CEO resignation.


Main Topic: Merging LLM Models

1. SLERP (Smoothly Large Embeddings for Representation Pooling)

2. TIES (Technically Intuitive Embedding Space)

3. Frankenmerges (A new approach to merging models)


References:

  1. [2403.05286] LLM4Decompile: Decompiling Binary Code with Large Language Models

  2. [2311.03099] Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

  3. [2306.11644] Textbooks Are All You Need

  4. [1909.11299] Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

  5. [1511.07543] Convergent Learning: Do different neural networks learn the same representations?

  6. An empirical analysis of compute-optimal large language model training - Google DeepMind

  7. ...more
    View all episodesView all episodes
    Download on the App Store

    Machine Learning Made SimpleBy Saugata Chatterjee