Mechanical Dreams

Thinking Like Transformers


Listen Later

In this episode:
• The Transformer's Black Box: Linda introduces the 'Thinking Like Transformers' paper, highlighting the challenge of understanding the computational model behind transformers, unlike RNNs and their connection to finite state machines. Professor Norris agrees, sharing a witty remark about the opacity of modern deep learning models.
• Introducing RASP: A Language for Transformers: Linda explains the core concept of RASP (Restricted Access Sequence Processing Language), a programming language designed to mirror the information flow of a transformer. She details the main operations: element-wise computations, and the crucial 'select' and 'aggregate' pair that mimics attention.
• From Code to Heads: RASP in Action: To make the concepts concrete, Linda walks through a simple RASP program from the paper, such as creating a histogram of tokens. They discuss the key insight that a RASP program can be 'compiled' to estimate the number of layers and attention heads a transformer would need for the task.
• Implications and Insights: The hosts explore the broader implications of the RASP model, such as analyzing the expressive power of restricted-attention models and explaining empirical results like the 'Sandwich Transformer'. Professor Norris is particularly intrigued by how this formal model can explain real-world phenomena.
• Thinking Like a Researcher: Professor Norris and Linda summarize the paper's contributions, agreeing that RASP provides a powerful conceptual tool for reasoning about transformer capabilities. Linda concludes by mentioning the publicly available RASP REPL for listeners who want to experiment themselves.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk