
Sign up to save your podcasts
Or
TL;DR: The Google DeepMind AGI Safety team is hiring for Applied Interpretability research scientists and engineers. Applied Interpretability is a new subteam we are forming to focus on directly using model internals-based techniques to make models safer in production. Achieving this goal will require doing research on the critical path that enables interpretability methods to be more widely used for practical problems. We believe this has significant direct and indirect benefits for preventing AGI x-risk, and argue this below. Our ideal candidate has experience with ML engineering and some hands-on experience with language model interpretability research. To apply for this role (as well as other open AGI Safety and Gemini Safety roles), follow the links for Research Engineers here & Research Scientists here.
1. What is Applied Interpretability?
At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]
---
Outline:
(01:00) 1. What is Applied Interpretability?
(03:57) 2. Specific projects were interested in working on
(06:39) FAQ
(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?
(07:16) How much autonomy will I have?
(09:03) Why do applied interpretability rather than fundamental research?
(10:31) What makes someone a good fit for the role?
(11:15) I've heard that Google infra can be pretty slow and bad
(11:42) Can I publish?
(12:19) Does probing really count as interpretability?
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
TL;DR: The Google DeepMind AGI Safety team is hiring for Applied Interpretability research scientists and engineers. Applied Interpretability is a new subteam we are forming to focus on directly using model internals-based techniques to make models safer in production. Achieving this goal will require doing research on the critical path that enables interpretability methods to be more widely used for practical problems. We believe this has significant direct and indirect benefits for preventing AGI x-risk, and argue this below. Our ideal candidate has experience with ML engineering and some hands-on experience with language model interpretability research. To apply for this role (as well as other open AGI Safety and Gemini Safety roles), follow the links for Research Engineers here & Research Scientists here.
1. What is Applied Interpretability?
At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]
---
Outline:
(01:00) 1. What is Applied Interpretability?
(03:57) 2. Specific projects were interested in working on
(06:39) FAQ
(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?
(07:16) How much autonomy will I have?
(09:03) Why do applied interpretability rather than fundamental research?
(10:31) What makes someone a good fit for the role?
(11:15) I've heard that Google infra can be pretty slow and bad
(11:42) Can I publish?
(12:19) Does probing really count as interpretability?
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,332 Listeners
2,395 Listeners
7,996 Listeners
4,119 Listeners
90 Listeners
1,498 Listeners
9,267 Listeners
91 Listeners
426 Listeners
5,455 Listeners
15,433 Listeners
507 Listeners
125 Listeners
72 Listeners
467 Listeners