March 14, 2024

LW - Highlights from Lex Fridman's interview of Yann LeCun by Joel Burget

1 hour 3 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Highlights from Lex Fridman's interview of Yann LeCun, published by Joel Burget on March 14, 2024 on LessWrong.

Introduction

Yann LeCun is perhaps the most prominent critic of the "LessWrong view" on AI safety, the only one of the three "godfathers of AI" to not acknowledge the risks of advanced AI. So, when he recently appeared on the Lex Fridman podcast, I listened with the intent to better understand his position. LeCun came across as articulate / thoughtful[1]. Though I don't agree with it all, I found a lot worth sharing.

Most of this post consists of quotes from the transcript, where I've bolded the most salient points. There are also a few notes from me as well as a short summary at the end.

Limitations of Autoregressive LLMs

Lex Fridman (00:01:52) You've said that autoregressive LLMs are not the way we're going to make progress towards superhuman intelligence. These are the large language models like GPT-4, like Llama 2 and 3 soon and so on. How do they work and why are they not going to take us all the way?

Yann LeCun (00:02:47) For a number of reasons. The first is that there [are] a number of characteristics of intelligent behavior. For example, the capacity to understand the world, understand the physical world, the ability to remember and retrieve things, persistent memory, the ability to reason, and the ability to plan. Those are four essential characteristics of intelligent systems or entities, humans, animals.

LLMs can do none of those or they can only do them in a very primitive way and they don't really understand the physical world. They don't really have persistent memory. They can't really reason and they certainly can't plan. And so if you expect the system to become intelligent without having the possibility of doing those things, you're making a mistake. That is not to say that autoregressive LLMs are not useful. They're certainly useful.

That they're not interesting, that we can't build a whole ecosystem of applications around them… of course we can. But as a pass towards human-level intelligence, they're missing essential components.

(00:04:08) And then there is another tidbit or fact that I think is very interesting. Those LLMs are trained on enormous amounts of texts, basically, the entirety of all publicly available texts on the internet, right? That's typically on the order of 10^13 tokens. Each token is typically two bytes, so that's 2*10^13 bytes as training data. It would take you or me 170,000 years to just read through this at eight hours a day.

So it seems like an enormous amount of knowledge that those systems can accumulate, but then you realize it's really not that much data. If you talk to developmental psychologists and they tell you a four-year-old has been awake for 16,000 hours in his or her life, and the amount of information that has reached the visual cortex of that child in four years is about 10^15 bytes.

(00:05:12) And you can compute this by estimating that the optical nerve can carry about 20 megabytes per second roughly, and so 10 to the 15 bytes for a four-year-old versus two times 10 to the 13 bytes for 170,000 years worth of reading.

What that tells you is that through sensory input, we see a lot more information than we do through language, and that despite our intuition, most of what we learn and most of our knowledge is through our observation and interaction with the real world, not through language. Everything that we learn in the first few years of life, and certainly everything that animals learn has nothing to do with language.

Checking some claims:

An LLM training corpus is on order of 10^13 tokens. This seems about right: "Llama 2 was trained on 2.4T tokens and PaLM 2 on 3.6T tokens. GPT-4 is thought to have been trained on 4T tokens… Together AI introduced a 1 trillion (1T) token dataset called RedPaj...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

March 14, 2024

LW - Highlights from Lex Fridman's interview of Yann LeCun by Joel Burget

1 hour 3 minutes

Introduction

Most of this post consists of quotes from the transcript, where I've bolded the most salient points. There are also a few notes from me as well as a short summary at the end.

Limitations of Autoregressive LLMs

Checking some claims:

...more

Share LW - Highlights from Lex Fridman's interview of Yann LeCun by Joel Burget

Sign up to save your podcasts

LW - Highlights from Lex Fridman's interview of Yann LeCun by Joel Burget

LW - Highlights from Lex Fridman's interview of Yann LeCun by Joel Burget