April 16, 2026

Linear Classifier Probes for Intermediate Layers

This episode explores a 2016 paper on linear classifier probes, a simple method for testing what information is linearly recoverable from a neural network’s intermediate layers by attaching small classifiers to frozen hidden states. It explains the paper’s central finding—that class information often becomes increasingly linearly separable with depth—and why that suggested deep networks develop more organized, task-relevant representations even without being explicitly trained to make every layer separable. The discussion also emphasizes a crucial caveat: probes measure what information is accessible, not which layer causally performs a computation, making them tools for analysis rather than proof of mechanism. Listeners would find it interesting for its clear connection to modern interpretability, transfer learning, and evaluation practices, as well as its argument that this now-standard probing approach was an early step toward opening up the neural network “black box.”

Sources:

1. Understanding intermediate layers using linear classifier probes — Guillaume Alain, Yoshua Bengio, 2016

http://arxiv.org/abs/1610.01644

2. Probing Classifiers: Promises, Shortcomings, and Advances — Yonatan Belinkov, 2021

http://arxiv.org/abs/2102.12452

3. Towards Best Practices of Activation Patching in Language Models: Metrics and Methods — Fred Zhang, Neel Nanda, 2023

http://arxiv.org/abs/2309.16042

4. https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8

https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8

5. Understanding intermediate layers using linear classifier probes — Guillaume Alain, Yoshua Bengio, 2016

https://scholar.google.com/scholar?q=Understanding+intermediate+layers+using+linear+classifier+probes

6. On the Transferability of Features in Deep Neural Networks — Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, 2014

https://scholar.google.com/scholar?q=On+the+Transferability+of+Features+in+Deep+Neural+Networks

7. Do Better ImageNet Models Transfer Better? — Simon Kornblith, Jonathon Shlens, Quoc V. Le, 2019

https://scholar.google.com/scholar?q=Do+Better+ImageNet+Models+Transfer+Better?

8. A Survey on Probing Methods for Linguistic Information in Neural Language Models — Najoung Kim, Roma Patel, Adam Poliak, Patrick Xia, Alex Wang, Samuel R. Bowman, Yoon Kim, Katharina Kann, 2022

https://scholar.google.com/scholar?q=A+Survey+on+Probing+Methods+for+Linguistic+Information+in+Neural+Language+Models

9. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps — Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2013

https://scholar.google.com/scholar?q=Deep+Inside+Convolutional+Networks:+Visualising+Image+Classification+Models+and+Saliency+Maps

10. Understanding Neural Networks Through Deep Visualization — Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, Hod Lipson, 2015

https://scholar.google.com/scholar?q=Understanding+Neural+Networks+Through+Deep+Visualization

11. How Transferable Are Features in Deep Neural Networks? — Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, 2014

https://scholar.google.com/scholar?q=How+Transferable+Are+Features+in+Deep+Neural+Networks?

12. Decaf: A Deep Convolutional Activation Feature for Generic Visual Recognition — Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell, 2014

https://scholar.google.com/scholar?q=Decaf:+A+Deep+Convolutional+Activation+Feature+for+Generic+Visual+Recognition

13. Learning Deep Features for Discriminative Localization — Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, 2016

https://scholar.google.com/scholar?q=Learning+Deep+Features+for+Discriminative+Localization

14. The Information Bottleneck Theory of Deep Learning — Naftali Tishby, Noga Zaslavsky, 2015

https://scholar.google.com/scholar?q=The+Information+Bottleneck+Theory+of+Deep+Learning

15. Visualizing and Understanding Convolutional Networks — Matthew D. Zeiler, Rob Fergus, 2014

https://scholar.google.com/scholar?q=Visualizing+and+Understanding+Convolutional+Networks

16. Using Linear Classifier Probes — Yonatan Belinkov, 2022

https://scholar.google.com/scholar?q=Using+Linear+Classifier+Probes

17. What do you learn from context? Probing for sentence structure in contextualized word representations — Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Samuel R. Bowman, Eunsol Choi, 2019

https://scholar.google.com/scholar?q=What+do+you+learn+from+context?+Probing+for+sentence+structure+in+contextualized+word+representations

18. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models — Ethan Dyer, Guy Gur-Ari, Ishaan Gulrajani, et al., 2024

https://scholar.google.com/scholar?q=Beyond+the+Imitation+Game:+Quantifying+and+extrapolating+the+capabilities+of+language+models

19. Does representation matter? exploring intermediate layers in large language models — unknown from snippet, likely 2024 or 2025

https://scholar.google.com/scholar?q=Does+representation+matter?+exploring+intermediate+layers+in+large+language+models

20. A separability-based approach to quantifying generalization: which layer is best? — unknown from snippet, likely 2023-2025

https://scholar.google.com/scholar?q=A+separability-based+approach+to+quantifying+generalization:+which+layer+is+best?

21. The topology and geometry of neural representations — unknown from snippet, likely 2023-2025

https://scholar.google.com/scholar?q=The+topology+and+geometry+of+neural+representations

22. Context Matters: Analyzing the Generalizability of Linear Probing and Steering Across Diverse Scenarios — unknown from snippet, likely 2024 or 2025

https://scholar.google.com/scholar?q=Context+Matters:+Analyzing+the+Generalizability+of+Linear+Probing+and+Steering+Across+Diverse+Scenarios

23. AI Post Transformers: Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions — Hal Turing & Dr. Ada Shannon, Fri,

https://podcast.do-not-panic.com/episodes/xavier-initialization-deep-feedforward-networks-training-difficulties-and-soluti/

24. AI Post Transformers: Language Models are Injective and Hence Invertible — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-03-21-language-models-are-injective-an-7545e0.mp3

Interactive Visualization: Linear Classifier Probes for Intermediate Layers

...more

View all episodes

By mcgrof