
Sign up to save your podcasts
Or


TL;DR: The Google DeepMind AGI Safety team is hiring for Applied Interpretability research scientists and engineers. Applied Interpretability is a new subteam we are forming to focus on directly using model internals-based techniques to make models safer in production. Achieving this goal will require doing research on the critical path that enables interpretability methods to be more widely used for practical problems. We believe this has significant direct and indirect benefits for preventing AGI x-risk, and argue this below. Our ideal candidate has experience with ML engineering and some hands-on experience with language model interpretability research. To apply for this role (as well as other open AGI Safety and Gemini Safety roles), follow the links for Research Engineers here & Research Scientists here.
1. What is Applied Interpretability?
At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]
---
Outline:
(01:00) 1. What is Applied Interpretability?
(03:57) 2. Specific projects were interested in working on
(06:39) FAQ
(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?
(07:16) How much autonomy will I have?
(09:03) Why do applied interpretability rather than fundamental research?
(10:31) What makes someone a good fit for the role?
(11:15) I've heard that Google infra can be pretty slow and bad
(11:42) Can I publish?
(12:19) Does probing really count as interpretability?
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongTL;DR: The Google DeepMind AGI Safety team is hiring for Applied Interpretability research scientists and engineers. Applied Interpretability is a new subteam we are forming to focus on directly using model internals-based techniques to make models safer in production. Achieving this goal will require doing research on the critical path that enables interpretability methods to be more widely used for practical problems. We believe this has significant direct and indirect benefits for preventing AGI x-risk, and argue this below. Our ideal candidate has experience with ML engineering and some hands-on experience with language model interpretability research. To apply for this role (as well as other open AGI Safety and Gemini Safety roles), follow the links for Research Engineers here & Research Scientists here.
1. What is Applied Interpretability?
At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]
---
Outline:
(01:00) 1. What is Applied Interpretability?
(03:57) 2. Specific projects were interested in working on
(06:39) FAQ
(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?
(07:16) How much autonomy will I have?
(09:03) Why do applied interpretability rather than fundamental research?
(10:31) What makes someone a good fit for the role?
(11:15) I've heard that Google infra can be pretty slow and bad
(11:42) Can I publish?
(12:19) Does probing really count as interpretability?
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

26,344 Listeners

2,444 Listeners

9,132 Listeners

4,153 Listeners

92 Listeners

1,597 Listeners

9,901 Listeners

90 Listeners

505 Listeners

5,473 Listeners

16,053 Listeners

540 Listeners

132 Listeners

96 Listeners

517 Listeners