LessWrong (30+ Karma)

“The Data Wall is Important” by JustisMills


Listen Later

This is a link post.

Modern AI is trained on a huge fraction of the internet, especially at the cutting edge, with the best models trained on close to all the high quality data we’ve got.[1] And data is really important! You can scale up compute, you can make algorithms more efficient, or you can add infrastructure around a model to make it more useful, but on the margin, great datasets are king. And, naively, we’re about to run out of fresh data to use.

It's rumored that the top firms are looking for ways to get around the data wall. One possible approach is having LLMs create their own data to train on, for which there is kinda-sorta a precedent from, e.g. modern chess AIs learning by playing games against themselves.[2] Or just finding ways to make AI dramatically more sample efficient with the data we’ve already got: the [...]

The original text contained 3 footnotes which were omitted from this narration.

---

First published:

June 9th, 2024

Source:

https://www.lesswrong.com/posts/axjb7tN9X2Mx4HzPz/the-data-wall-is-important

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,882 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,216 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

533 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,223 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates by Liron Shapira

Doom Debates

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners