February 24, 2025

“The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research” by Arthur Conmy, Neel Nanda

13 minutes

TL;DR: The Google DeepMind AGI Safety team is hiring for Applied Interpretability research scientists and engineers. Applied Interpretability is a new subteam we are forming to focus on directly using model internals-based techniques to make models safer in production. Achieving this goal will require doing research on the critical path that enables interpretability methods to be more widely used for practical problems. We believe this has significant direct and indirect benefits for preventing AGI x-risk, and argue this below. Our ideal candidate has experience with ML engineering and some hands-on experience with language model interpretability research. To apply for this role (as well as other open AGI Safety and Gemini Safety roles), follow the links for Research Engineers here & Research Scientists here.

1. What is Applied Interpretability?

At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]

---

Outline:

(01:00) 1. What is Applied Interpretability?

(03:57) 2. Specific projects were interested in working on

(06:39) FAQ

(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?

(07:16) How much autonomy will I have?

(09:03) Why do applied interpretability rather than fundamental research?

(10:31) What makes someone a good fit for the role?

(11:15) I've heard that Google infra can be pretty slow and bad

(11:42) Can I publish?

(12:19) Does probing really count as interpretability?

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

February 24th, 2025

Source:

https://www.lesswrong.com/posts/aG9e5tHfHmBnDqrDy/the-gdm-agi-safety-alignment-team-is-hiring-for-applied

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong

February 24, 2025

“The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research” by Arthur Conmy, Neel Nanda

13 minutes

1. What is Applied Interpretability?

At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]

---

Outline:

(01:00) 1. What is Applied Interpretability?

(03:57) 2. Specific projects were interested in working on

(06:39) FAQ

(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?

(07:16) How much autonomy will I have?

(09:03) Why do applied interpretability rather than fundamental research?

(10:31) What makes someone a good fit for the role?

(11:15) I've heard that Google infra can be pretty slow and bad

(11:42) Can I publish?

(12:19) Does probing really count as interpretability?

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

February 24th, 2025

Source:

https://www.lesswrong.com/posts/aG9e5tHfHmBnDqrDy/the-gdm-agi-safety-alignment-team-is-hiring-for-applied

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

View all

Making Sense with Sam Harris

26,332 Listeners

Conversations with Tyler

2,395 Listeners

The Peter Attia Drive

7,996 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,119 Listeners

ManifoldOne

90 Listeners

Your Undivided Attention

1,498 Listeners

All-In with Chamath, Jason, Sacks & Friedberg

9,267 Listeners

Machine Learning Street Talk (MLST)

91 Listeners

Dwarkesh Podcast

426 Listeners

Hard Fork

5,455 Listeners

The Ezra Klein Show

15,433 Listeners

Moonshots with Peter Diamandis

507 Listeners

No Priors: Artificial Intelligence | Technology | Startups

125 Listeners

Latent Space: The AI Engineer Podcast

72 Listeners

BG2Pod with Brad Gerstner and Bill Gurley

467 Listeners

Share “The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research” by Arthur Conmy, Neel Nanda

Sign up to save your podcasts

“The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research” by Arthur Conmy, Neel Nanda

“The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research” by Arthur Conmy, Neel Nanda

More shows like LessWrong (30+ Karma)

Making Sense with Sam Harris

Conversations with Tyler

The Peter Attia Drive

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

ManifoldOne

Your Undivided Attention

All-In with Chamath, Jason, Sacks & Friedberg

Machine Learning Street Talk (MLST)

Dwarkesh Podcast

Hard Fork

The Ezra Klein Show

Moonshots with Peter Diamandis

No Priors: Artificial Intelligence | Technology | Startups

Latent Space: The AI Engineer Podcast

BG2Pod with Brad Gerstner and Bill Gurley