August 09, 2023

AF - Ground-Truth Label Imbalance Impairs Contrast-Consistent Search Performance by Tom Angsten

2 minutes

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ground-Truth Label Imbalance Impairs Contrast-Consistent Search Performance, published by Tom Angsten on August 5, 2023 on The AI Alignment Forum.

Contrast-Consistent Search (CCS) is a method for finding truthful directions within the activation spaces of large language models (LLMs) in an unsupervised way, introduced in Burns et al., 2022. However, all experiments in that study involve training datasets that are balanced with respect to the ground-truth labels of the questions used to generate contrast pairs.[1] This allows for the possibility that CCS performance is implicitly dependent on the balance of ground-truth labels, and therefore is not truly unsupervised.

In this work, we show that the imbalance of ground-truth labels in the training dataset can prevent CCS from consistently finding truthful directions in an LLM's activation space.

Below is a plot of CCS performance versus ground-truth label imbalance for the IMDB dataset, which was one of the datasets used in the original paper. We discuss in the write-up the possible mechanisms for this observed reduction in performance as imbalance becomes more severe.

Relevance to Alignment

One can imagine training datasets with arbitrarily severely imbalanced ground-truth labels, such as questions pertaining to anomaly detection (e.g., a dataset formed from the prompt template "Is this plan catastrophic to humanity? {{gpt_n_proposed_plan}} Yes or no?", to which the ground-truth label is hopefully "no" a vast majority of the time). We show that CCS can perform poorly on a heavily imbalanced dataset, and therefore should not be trusted in fully unsupervised applications without further improvements to the CCS method.

Note: Our original goal was to replicate Burns et al. (2022), and, during this process, we noticed the implicit assumption around balanced ground-truth labels. We're new to technical alignment research, and although we believe that performance degradation caused by imbalance could be an important consideration for future alignment applications of CCS (or similar unsupervised methods), we lack the necessary experience to fully justify this belief.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

View all episodes

By The Nonlinear Fund

August 09, 2023

AF - Ground-Truth Label Imbalance Impairs Contrast-Consistent Search Performance by Tom Angsten

2 minutes

In this work, we show that the imbalance of ground-truth labels in the training dataset can prevent CCS from consistently finding truthful directions in an LLM's activation space.

Relevance to Alignment

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

9 Listeners

Share AF - Ground-Truth Label Imbalance Impairs Contrast-Consistent Search Performance by Tom Angsten

Sign up to save your podcasts

AF - Ground-Truth Label Imbalance Impairs Contrast-Consistent Search Performance by Tom Angsten

AF - Ground-Truth Label Imbalance Impairs Contrast-Consistent Search Performance by Tom Angsten

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast