
Sign up to save your podcasts
Or


In episode 60 of The Gradient Podcast, Daniel Bashir speaks to Hattie Zhou.
Hattie is a PhD student at the Université de Montréal and Mila. Her research focuses on understanding how and why neural networks work, based on the belief that the performance of modern neural networks exceeds our understanding and that building more capable and trustworthy models requires bridging this gap. Prior to Mila, she spent time as a data scientist at Uber and did research with Uber AI Labs.
Have suggestions for future podcast guests (or other feedback)? Let us know here!
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (01:55) Hattie’s Origin Story, Uber AI Labs, empirical theory and other sorts of research
* (10:00) Intro to the Lottery Ticket Hypothesis & Deconstructing Lottery Tickets
* (14:30) Lottery tickets as lucky initialization
* (17:00) Types of masking and the “masking is training” claim
* (24:00) Type-0 masks and weight evolution over long training trajectories
* (27:00) Can you identify good masks or training trajectories a priori?
* (29:00) The role of signs in neural net initialization
* (35:27) The Supermask
* (41:00) Masks to probe pretrained models and model steerability
* (47:40) Fortuitous Forgetting in Connectionist Networks
* (54:00) Relationships to other work (double descent, grokking, etc.)
* (1:01:00) The iterative training process in fortuitous forgetting, scale and value of exploring alternatives
* (1:03:35) In-Context Learning and Teaching Algorithmic Reasoning
* (1:09:00) Learning + algorithmic reasoning, prompting strategy
* (1:13:50) What’s happening with in-context learning?
* (1:14:00) Induction heads
* (1:17:00) ICL and gradient descent
* (1:22:00) Algorithmic prompting vs discovery
* (1:24:45) Future directions for algorithmic prompting
* (1:26:30) Interesting work from NeurIPS 2022
* (1:28:20) Hattie’s perspective on scientific questions people pay attention to, underrated problems
* (1:34:30) Hattie’s perspective on ML publishing culture
* (1:42:12) Outro
Links:
* Hattie’s homepage and Twitter
* Papers
* Deconstructing Lottery Tickets: Zeros, signs, and the Supermask
* Fortuitous Forgetting in Connectionist Networks
* Teaching Algorithmic Reasoning via In-context Learning
By Daniel Bashir4.7
4747 ratings
In episode 60 of The Gradient Podcast, Daniel Bashir speaks to Hattie Zhou.
Hattie is a PhD student at the Université de Montréal and Mila. Her research focuses on understanding how and why neural networks work, based on the belief that the performance of modern neural networks exceeds our understanding and that building more capable and trustworthy models requires bridging this gap. Prior to Mila, she spent time as a data scientist at Uber and did research with Uber AI Labs.
Have suggestions for future podcast guests (or other feedback)? Let us know here!
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (01:55) Hattie’s Origin Story, Uber AI Labs, empirical theory and other sorts of research
* (10:00) Intro to the Lottery Ticket Hypothesis & Deconstructing Lottery Tickets
* (14:30) Lottery tickets as lucky initialization
* (17:00) Types of masking and the “masking is training” claim
* (24:00) Type-0 masks and weight evolution over long training trajectories
* (27:00) Can you identify good masks or training trajectories a priori?
* (29:00) The role of signs in neural net initialization
* (35:27) The Supermask
* (41:00) Masks to probe pretrained models and model steerability
* (47:40) Fortuitous Forgetting in Connectionist Networks
* (54:00) Relationships to other work (double descent, grokking, etc.)
* (1:01:00) The iterative training process in fortuitous forgetting, scale and value of exploring alternatives
* (1:03:35) In-Context Learning and Teaching Algorithmic Reasoning
* (1:09:00) Learning + algorithmic reasoning, prompting strategy
* (1:13:50) What’s happening with in-context learning?
* (1:14:00) Induction heads
* (1:17:00) ICL and gradient descent
* (1:22:00) Algorithmic prompting vs discovery
* (1:24:45) Future directions for algorithmic prompting
* (1:26:30) Interesting work from NeurIPS 2022
* (1:28:20) Hattie’s perspective on scientific questions people pay attention to, underrated problems
* (1:34:30) Hattie’s perspective on ML publishing culture
* (1:42:12) Outro
Links:
* Hattie’s homepage and Twitter
* Papers
* Deconstructing Lottery Tickets: Zeros, signs, and the Supermask
* Fortuitous Forgetting in Connectionist Networks
* Teaching Algorithmic Reasoning via In-context Learning

229,169 Listeners

1,089 Listeners

334 Listeners

4,182 Listeners

211 Listeners

6,095 Listeners

9,927 Listeners

511 Listeners

5,512 Listeners

15,272 Listeners

29,246 Listeners

10 Listeners

25 Listeners