March 12, 2024

AF - Transformer Debugger by Henk Tillman

1 minute

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Transformer Debugger, published by Henk Tillman on March 12, 2024 on The AI Alignment Forum.

Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into circuits underlying specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders.

TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior.

It can be used to answer questions like, "Why does the model output token A instead of token B for this prompt?" or "Why does attention head H to attend to token T for this prompt?" It does so by identifying specific components (neurons, attention heads, autoencoder latents) that contribute to the behavior, showing automatically generated explanations of what causes those components to activate most strongly, and tracing connections between components to help discover circuits.

These videos give an overview of TDB and show how it can be used to investigate

indirect object identification in GPT-2 small:

Introduction

Neuron viewer pages

Example: Investigating name mover heads, part 1

Example: Investigating name mover heads, part 2

Contributors: Dan Mossing, Steven Bills, Henk Tillman, Tom Dupré la Tour, Nick Cammarata, Leo Gao, Joshua Achiam, Catherine Yeh, Jan Leike, Jeff Wu, and William Saunders.

Thanks to Johnny Lin for contributing to the explanation simulator design.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

View all episodes

By The Nonlinear Fund

March 12, 2024

AF - Transformer Debugger by Henk Tillman

1 minute

TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior.

These videos give an overview of TDB and show how it can be used to investigate

indirect object identification in GPT-2 small:

Introduction

Neuron viewer pages

Example: Investigating name mover heads, part 1

Example: Investigating name mover heads, part 2

Contributors: Dan Mossing, Steven Bills, Henk Tillman, Tom Dupré la Tour, Nick Cammarata, Leo Gao, Joshua Achiam, Catherine Yeh, Jan Leike, Jeff Wu, and William Saunders.

Thanks to Johnny Lin for contributing to the explanation simulator design.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

9 Listeners

Share AF - Transformer Debugger by Henk Tillman

Sign up to save your podcasts

AF - Transformer Debugger by Henk Tillman

AF - Transformer Debugger by Henk Tillman

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast