September 10, 2023

🔥🎙️ ThursdAI Sunday special - Extending LLaMa to 128K context window (2 orders of magnitude) with YaRN [Interview with authors]

Listen Later

54 minutes

This is a free preview of a paid episode. To hear more, visit sub.thursdai.news

Happy Sunday everyone, I am very excited to bring you this interview with the folks who took LLaMa 2 and made it LLoooooongMa!

Extending LLaMa 2 context window from 4,000 to a whopping 128,000 tokens (Yarn-Llama-2-13b-128k on Hugging Face), these guys also came up with a paper called YaRN (Efficient Context Window Extension of Large Language Models) and showed that YaRN is not only requires 10x less tokens to create these long contexts, but also 2.5x less training steps!

And, the models generalize so there’s now no need to collect extremely long sequences (think books length sequences) for the models to understand those context lengths.

I have decided also to do something different (which took me half of Sunday so I can’t promise and am not committing to this format, but for the premium subscribers, you can now watch this interview with running Karaoke style subtitles and improved audio! This will be uploaded to Youtube in a week but aren’t you glad you subscribed and is getting this first?)

Here’s a teaser preview:

And here’s the chapter for your convenience (the only thing that’s ai generated 😂)

0:00 - Introduction

3:08 - Discussion of extending LLAMA2's context length from 4,000 tokens to 128,000 tokens using the YaRN method

8:23 - Explanation of rope scaling for positional encodings in transformers

13:21 - How the rope scaling idea allows for longer context through positional interpolation

18:51 - Using in-context learning to train models on shorter sequences but still handle long contexts

25:18 - Sourcing long-form data like books to train 128k token models

31:21 - Whether future models will natively support longer contexts

37:33 - New model from Adept with 16k context using rope scaling

42:46 - Attention is quadratic - need better algorithms to make long context usable

49:39 - Open source community pushing state of the art alongside big labs

52:34 - Closing thoughts

As always, full (manually edited) transcription (and this time a special video version!) is reserved for the premium subscribers, I promise it’ll be worth it, so why not .. y’know? skip a cup of coffee from SB and support ThursdAI?

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

ThursdAI - The top AI news from the past week

By From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week

4.9

1313 ratings

September 10, 2023

🔥🎙️ ThursdAI Sunday special - Extending LLaMa to 128K context window (2 orders of magnitude) with YaRN [Interview with authors]

Listen Later

54 minutes

This is a free preview of a paid episode. To hear more, visit sub.thursdai.news

Happy Sunday everyone, I am very excited to bring you this interview with the folks who took LLaMa 2 and made it LLoooooongMa!

Extending LLaMa 2 context window from 4,000 to a whopping 128,000 tokens (Yarn-Llama-2-13b-128k on Hugging Face), these guys also came up with a paper called YaRN (Efficient Context Window Extension of Large Language Models) and showed that YaRN is not only requires 10x less tokens to create these long contexts, but also 2.5x less training steps!

And, the models generalize so there’s now no need to collect extremely long sequences (think books length sequences) for the models to understand those context lengths.

I have decided also to do something different (which took me half of Sunday so I can’t promise and am not committing to this format, but for the premium subscribers, you can now watch this interview with running Karaoke style subtitles and improved audio! This will be uploaded to Youtube in a week but aren’t you glad you subscribed and is getting this first?)

Here’s a teaser preview:

And here’s the chapter for your convenience (the only thing that’s ai generated 😂)

0:00 - Introduction

3:08 - Discussion of extending LLAMA2's context length from 4,000 tokens to 128,000 tokens using the YaRN method

8:23 - Explanation of rope scaling for positional encodings in transformers

13:21 - How the rope scaling idea allows for longer context through positional interpolation

18:51 - Using in-context learning to train models on shorter sequences but still handle long contexts

25:18 - Sourcing long-form data like books to train 128k token models

31:21 - Whether future models will natively support longer contexts

37:33 - New model from Adept with 16k context using rope scaling

42:46 - Attention is quadratic - need better algorithms to make long context usable

49:39 - Open source community pushing state of the art alongside big labs

52:34 - Closing thoughts

As always, full (manually edited) transcription (and this time a special video version!) is reserved for the premium subscribers, I promise it’ll be worth it, so why not .. y’know? skip a cup of coffee from SB and support ThursdAI?

...more

More shows like ThursdAI - The top AI news from the past week

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

297 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

339 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

221 Listeners

Practical AI by Practical AI LLC

Practical AI

206 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

194 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

88 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

455 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

130 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

209 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

96 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

553 Listeners

AI For Humans: Making Artificial Intelligence Fun & Practical by Kevin Pereira & Gavin Purcell

AI For Humans: Making Artificial Intelligence Fun & Practical

264 Listeners

AI + a16z by a16z

AI + a16z

31 Listeners

Training Data by Sequoia Capital

Training Data

40 Listeners

AI Agents Podcast by AI Agents Podcast

AI Agents Podcast

12 Listeners