
Sign up to save your podcasts
Or
TL;DR: The Google DeepMind AGI Safety team is hiring for Applied Interpretability research scientists and engineers. Applied Interpretability is a new subteam we are forming to focus on directly using model internals-based techniques to make models safer in production. Achieving this goal will require doing research on the critical path that enables interpretability methods to be more widely used for practical problems. We believe this has significant direct and indirect benefits for preventing AGI x-risk, and argue this below. Our ideal candidate has experience with ML engineering and some hands-on experience with language model interpretability research. To apply for this role (as well as other open AGI Safety and Gemini Safety roles), follow the links for Research Engineers here & Research Scientists here.
1. What is Applied Interpretability?
At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]
---
Outline:
(01:00) 1. What is Applied Interpretability?
(03:57) 2. Specific projects were interested in working on
(06:39) FAQ
(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?
(07:16) How much autonomy will I have?
(09:03) Why do applied interpretability rather than fundamental research?
(10:31) What makes someone a good fit for the role?
(11:15) I've heard that Google infra can be pretty slow and bad
(11:42) Can I publish?
(12:19) Does probing really count as interpretability?
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
TL;DR: The Google DeepMind AGI Safety team is hiring for Applied Interpretability research scientists and engineers. Applied Interpretability is a new subteam we are forming to focus on directly using model internals-based techniques to make models safer in production. Achieving this goal will require doing research on the critical path that enables interpretability methods to be more widely used for practical problems. We believe this has significant direct and indirect benefits for preventing AGI x-risk, and argue this below. Our ideal candidate has experience with ML engineering and some hands-on experience with language model interpretability research. To apply for this role (as well as other open AGI Safety and Gemini Safety roles), follow the links for Research Engineers here & Research Scientists here.
1. What is Applied Interpretability?
At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]
---
Outline:
(01:00) 1. What is Applied Interpretability?
(03:57) 2. Specific projects were interested in working on
(06:39) FAQ
(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?
(07:16) How much autonomy will I have?
(09:03) Why do applied interpretability rather than fundamental research?
(10:31) What makes someone a good fit for the role?
(11:15) I've heard that Google infra can be pretty slow and bad
(11:42) Can I publish?
(12:19) Does probing really count as interpretability?
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,334 Listeners
2,399 Listeners
7,817 Listeners
4,107 Listeners
87 Listeners
1,453 Listeners
8,761 Listeners
90 Listeners
353 Listeners
5,356 Listeners
15,023 Listeners
464 Listeners
128 Listeners
73 Listeners
433 Listeners