Artificially Unintelligent

E36 Untangling the Decision Making Process of Neural Networks - A Paper Deep Dive of Zoom In: An Introduction to Circuits


Listen Later

Mechanistic interpretability refers to understanding a model by looking at how its internal components function and interact with each other. It's about breaking down the model into its smallest functional parts and explaining how these parts come together to produce the model's outputs.

Neural networks are complex, making it hard to make broad, factual statements about their behavior. However, focusing on small, specific parts of neural networks, known as "circuits", might offer a way to rigorously investigate them. These "circuits" can be edited and analyzed in a falsifiable manner, making them potential foundations for understanding interpretability.

Zoom In looks at neural networks from a biological perspective looking at features and circuits to untangle their behaviors.

Do you still want to hear more from us? Follow us on the Socials:

  • Nicolay: LinkedIn | X (formerly known as Twitter)
  • William: LinkedIn
  • ...more
    View all episodesView all episodes
    Download on the App Store

    Artificially UnintelligentBy Artificially Unintelligent