This episode explores a 2016 paper on linear classifier probes, a simple method for testing what information is linearly recoverable from a neural network’s intermediate layers by attaching small classifiers to frozen hidden states. It explains the paper’s central finding—that class information often becomes increasingly linearly separable with depth—and why that suggested deep networks develop more organized, task-relevant representations even without being explicitly trained to make every layer separable. The discussion also emphasizes a crucial caveat: probes measure what information is accessible, not which layer causally performs a computation, making them tools for analysis rather than proof of mechanism. Listeners would find it interesting for its clear connection to modern interpretability, transfer learning, and evaluation practices, as well as its argument that this now-standard probing approach was an early step toward opening up the neural network “black box.”
Sources:
1. Understanding intermediate layers using linear classifier probes — Guillaume Alain, Yoshua Bengio, 2016
http://arxiv.org/abs/1610.01644
2. Probing Classifiers: Promises, Shortcomings, and Advances — Yonatan Belinkov, 2021
http://arxiv.org/abs/2102.12452
3. Towards Best Practices of Activation Patching in Language Models: Metrics and Methods — Fred Zhang, Neel Nanda, 2023
http://arxiv.org/abs/2309.16042
4. https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8
https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8
5. Understanding intermediate layers using linear classifier probes — Guillaume Alain, Yoshua Bengio, 2016
https://scholar.google.com/scholar?q=Understanding+intermediate+layers+using+linear+classifier+probes
6. On the Transferability of Features in Deep Neural Networks — Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, 2014
https://scholar.google.com/scholar?q=On+the+Transferability+of+Features+in+Deep+Neural+Networks
7. Do Better ImageNet Models Transfer Better? — Simon Kornblith, Jonathon Shlens, Quoc V. Le, 2019
https://scholar.google.com/scholar?q=Do+Better+ImageNet+Models+Transfer+Better?
8. A Survey on Probing Methods for Linguistic Information in Neural Language Models — Najoung Kim, Roma Patel, Adam Poliak, Patrick Xia, Alex Wang, Samuel R. Bowman, Yoon Kim, Katharina Kann, 2022
https://scholar.google.com/scholar?q=A+Survey+on+Probing+Methods+for+Linguistic+Information+in+Neural+Language+Models
9. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps — Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2013
https://scholar.google.com/scholar?q=Deep+Inside+Convolutional+Networks:+Visualising+Image+Classification+Models+and+Saliency+Maps
10. Understanding Neural Networks Through Deep Visualization — Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, Hod Lipson, 2015
https://scholar.google.com/scholar?q=Understanding+Neural+Networks+Through+Deep+Visualization
11. How Transferable Are Features in Deep Neural Networks? — Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, 2014
https://scholar.google.com/scholar?q=How+Transferable+Are+Features+in+Deep+Neural+Networks?
12. Decaf: A Deep Convolutional Activation Feature for Generic Visual Recognition — Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell, 2014
https://scholar.google.com/scholar?q=Decaf:+A+Deep+Convolutional+Activation+Feature+for+Generic+Visual+Recognition
13. Learning Deep Features for Discriminative Localization — Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, 2016
https://scholar.google.com/scholar?q=Learning+Deep+Features+for+Discriminative+Localization
14. The Information Bottleneck Theory of Deep Learning — Naftali Tishby, Noga Zaslavsky, 2015
https://scholar.google.com/scholar?q=The+Information+Bottleneck+Theory+of+Deep+Learning
15. Visualizing and Understanding Convolutional Networks — Matthew D. Zeiler, Rob Fergus, 2014
https://scholar.google.com/scholar?q=Visualizing+and+Understanding+Convolutional+Networks
16. Using Linear Classifier Probes — Yonatan Belinkov, 2022
https://scholar.google.com/scholar?q=Using+Linear+Classifier+Probes
17. What do you learn from context? Probing for sentence structure in contextualized word representations — Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Samuel R. Bowman, Eunsol Choi, 2019
https://scholar.google.com/scholar?q=What+do+you+learn+from+context?+Probing+for+sentence+structure+in+contextualized+word+representations
18. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models — Ethan Dyer, Guy Gur-Ari, Ishaan Gulrajani, et al., 2024
https://scholar.google.com/scholar?q=Beyond+the+Imitation+Game:+Quantifying+and+extrapolating+the+capabilities+of+language+models
19. Does representation matter? exploring intermediate layers in large language models — unknown from snippet, likely 2024 or 2025
https://scholar.google.com/scholar?q=Does+representation+matter?+exploring+intermediate+layers+in+large+language+models
20. A separability-based approach to quantifying generalization: which layer is best? — unknown from snippet, likely 2023-2025
https://scholar.google.com/scholar?q=A+separability-based+approach+to+quantifying+generalization:+which+layer+is+best?
21. The topology and geometry of neural representations — unknown from snippet, likely 2023-2025
https://scholar.google.com/scholar?q=The+topology+and+geometry+of+neural+representations
22. Context Matters: Analyzing the Generalizability of Linear Probing and Steering Across Diverse Scenarios — unknown from snippet, likely 2024 or 2025
https://scholar.google.com/scholar?q=Context+Matters:+Analyzing+the+Generalizability+of+Linear+Probing+and+Steering+Across+Diverse+Scenarios
23. AI Post Transformers: Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions — Hal Turing & Dr. Ada Shannon, Fri,
https://podcast.do-not-panic.com/episodes/xavier-initialization-deep-feedforward-networks-training-difficulties-and-soluti/
24. AI Post Transformers: Language Models are Injective and Hence Invertible — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-21-language-models-are-injective-an-7545e0.mp3
Interactive Visualization: Linear Classifier Probes for Intermediate Layers