June 15, 2024

“Rational Animations’ intro to mechanistic interpretability” by Writer

Listen Later

19 minutes

This is a link post.

In our new video, we talk about research on interpreting InceptionV1, a convolutional neural network. Researchers have been able to understand the function of neurons and channels inside the network and uncover visual processing algorithms by looking at the weights. The work on InceptionV1 is early but landmark mechanistic interpretability research, and it functions well as an introduction to the field. We also go into the rationale and goals of the field and mention some more recent research near the end. Our main source material is the circuits thread in the Distill journal and this article on feature visualization. The author of the script is Arthur Frost. I have included the script below, although I recommend watching the video since the script has been written with accompanying moving visuals in mind.

Intro

In 2018, researchers trained an AI to find out if people were at [...]

---

Outline:

(00:56) Intro

(07:16) Visualisation by Optimisation

(11:09) Circuits

(15:27) Polysemanticity

(17:00) Closing thoughts, and the past few years of interpretability

---

First published:

June 14th, 2024

Source:

https://www.lesswrong.com/posts/tSNygWGHdpiBvzp4D/rational-animations-intro-to-mechanistic-interpretability

---

Narrated by TYPE III AUDIO.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

June 15, 2024

“Rational Animations’ intro to mechanistic interpretability” by Writer

Listen Later

19 minutes

This is a link post.

In our new video, we talk about research on interpreting InceptionV1, a convolutional neural network. Researchers have been able to understand the function of neurons and channels inside the network and uncover visual processing algorithms by looking at the weights. The work on InceptionV1 is early but landmark mechanistic interpretability research, and it functions well as an introduction to the field. We also go into the rationale and goals of the field and mention some more recent research near the end. Our main source material is the circuits thread in the Distill journal and this article on feature visualization. The author of the script is Arthur Frost. I have included the script below, although I recommend watching the video since the script has been written with accompanying moving visuals in mind.

Intro

In 2018, researchers trained an AI to find out if people were at [...]

---

Outline:

(00:56) Intro

(07:16) Visualisation by Optimisation

(11:09) Circuits

(15:27) Polysemanticity

(17:00) Closing thoughts, and the past few years of interpretability

---

First published:

June 14th, 2024

Source:

https://www.lesswrong.com/posts/tSNygWGHdpiBvzp4D/rational-animations-intro-to-mechanistic-interpretability

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

112,144 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,238 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

577 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,139 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners