The Nonlinear Library

LW - Interpreting OpenAI's Whisper by EllenaR


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Interpreting OpenAI's Whisper, published by EllenaR on September 24, 2023 on LessWrong.
(Work done as part of SERI MATS Summer 2023 cohort under the supervision of @Lee Sharkey . A blog post containing audio features that you can listen to can be found here.)
TL;DR - Mechanistic Interpretability has mainly focused on language and image models, but there's a growing need for interpretability in multimodal models that can handle text, images, audio, and video. Thus far, there have been minimal efforts directed toward interpreting audio models, let alone multimodal ones. To the best of my knowledge, this work presents the first attempt to do interpretability on a multimodal audio-text model. I show that acoustic features inside OpenAI's Whisper model are human interpretable and formulate a way of listening to them. I then go on to present some macroscopic properties of the model, specifically showing that encoder attention is highly localized and the decoder alone acts as a weak LM.
Why we should care about interpreting multimodal models
Up to this point, the main focus in mechanistic interpretability has centred around language and image models. GPT-4, which currently inputs both text and images, is paving the way for the development of fully multimodal models capable of handling images, text, audio, and video. A robust mechanistic interpretability toolbox should allow us to understand all parts of a model. However, when it comes to audio models, let alone multimodal ones, there is a notable lack of mechanistic interpretability research. This raises concerns, because it suggests that there might parts of multimodal models that we cannot understand. Specifically, an inability to interpret the input representations that are fed into the more cognitive parts of these models (which theoretically could perform dangerous computations) presents a problem. If we cannot understand the inputs, it is unlikely that we can understand the potentially dangerous bits.
This post is structured into 3 main claims that I make about the model:
The encoder learns human interpretable features
Encoder attention is highly localized
The decoder alone acts as a weak LM
For context: Whisper is a speech-to-text model. It has an encoder-decoder transformer architecture as shown below. We used Whisper tiny which is only 39M parameters but remarkably good at transcription! The input to the encoder is a 30s chunk of audio (shorter chunks can be padded) and the output from the decoder is the transcript, predicted autoregressively. It is trained only on labelled speech to text pairs.
1) The encoder learns human interpretable features
By finding maximally activating dataset examples (from a dataset of 10,000 2s audio clips) for MLP neurons/directions in the residual stream we are able to detect acoustic features corresponding to specific phonemes. By amplifying the audio around the sequence position where the feature is maximally active, you can clearly hear these phonemes, as demonstrated by the audio clips below.
1.1) Features in the MLP layers
It turns out that neurons in the MLP layers of the encoder are highly interpretable. The table below shows the phonetic sound that each neuron activates on for the first 50 neurons in block.2.mlp.1. You can also listen to some of these audio features here.
Neuron idx0123456789Phoneme'm''j/ch/sh''e/a''c/q''is''i'white noise'w''l''theNeuron idx10111213141516171819Phoneme'I'N/Awhite noisevowels'r''st''l'N/A'ch''p'Neuron idx20212223242526272829Phoneme'I''l''th''g''b/d'N/AN/AN/A'u/A'N/ANeuron idx30313233343536373839PhonemeN/AN/A'd''p''n'q''a''A/E/I'microphone'i'Neuron idx40414243444546474849Phoneme's'N/A'air''or/all''e/i''th'N/A'w''eer''w'
1.2) Residual Stream Features
The residual stream is not in a privileged basis so we would not expect the features it learns to b...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings