Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Call for Research Assistants in Developmental Interpretability, published by Jesse Hoogland on August 30, 2023 on LessWrong.
We are excited to announce multiple positions for Research Assistants to join our six-month research project assessing the viability of Developmental Interpretability (DevInterp).
This is a chance to gain expertise in interpretability, develop your skills as a researcher, build out a network of collaborators and mentors, publish in major conferences, and open a path towards future opportunities, including potential permanent roles, recommendations, and successive collaborations.
Background
Developmental interpretability is a research agenda aiming to build tools for detecting, locating, and understanding phase transitions in learning dynamics of neural networks. It draws on techniques from singular learning theory, mechanistic interpretability, statistical physics, and developmental biology.
Position Details
General info:
Title: Research Assistant / Research Engineer.
Location: Remote, with hubs in Melbourne and London.
Duration: Until March 2024 (at minimum).
Compensation: base salary is USD$35k per year, to be paid out as an independent contractor at an hourly rate.
Timeline:
Application Deadline: September 15th, 2023
Ideal Start Date: October 2023
How to Apply: Complete the application form by the deadline. Further information on the application process will be provided in the form.
Who We Are
The developmental interpretability research team consists of experts across a number of areas of mathematics, physics, statistics and AI safety. The principal researchers:
Daniel Murfet, mathematician and SLT expert, University of Melbourne.
Susan Wei, statistician and SLT expert, University of Melbourne.
Jesse Hoogland, MSc. Physics, SERI MATS scholar, RA in Krueger lab
We have a range of projects currently underway, led by one of these principal researchers and involving a number of other PhD and MSc students from the University of Melbourne and collaborators from around the world. In an organizational capacity you would also interact with Alexander Oldenziel and Stan van Wingerden.
You can find us and the broader DevInterp research community on our Discord. Beyond the Developmental Interpretability research agenda, you can read our first preprint on scalable SLT invariants and check out the lectures from the SLT & Alignment summit.
Overview of Projects
Here's the selection of the projects underway, some of which you would be expected to contribute to. These tend to be on the more experimental side:
Developing scalable estimates for SLT invariants: Invariants like the (local) learning coefficient and (local) singular fluctuation can signal the presence of "hidden" phase transitions. Improving these techniques can help us better identify these transitions.
DevInterp of vision models: To what extent do the kinds of circuits studied in the original circuits thread emerge through phase transitions?
DevInterp of program synthesis: In examples where we know there is rich compositional structure, can we see it in the singularities? Practically, this means studying settings like modular arithmetic (grokking), multitask sparse parity, and more complex variants.
DevInterp of in-context learning & induction heads: Is the development of induction heads a proper phase transition in the language of SLT? More ambitiously, can we apply singular learning theory to study in-context learning and make sense of "in-context phase transitions."
DevInterp of language models: Can we detect phase transitions in simple language models (like TinyStories). Can we, from these transitions, discover circuit structure? Can we extend these techniques to larger models (e.g., in the Pythia suite).
DevInterp of reinforcement learning models: To what extent are phase transitions inv...