LessWrong (30+ Karma)

“Aesthetic Preferences Can Cause Emergent Misalignment” by Anders Woodruff


Listen Later

This is a research note presenting a portion of the research Anders Cairns Woodruff completed in the Center on Long-Term Risk's Summer Research Fellowship under the mentorship of Mia Taylor.

The datasets can be found at https://huggingface.co/datasets/AndersWoodruff/AestheticEM

TL;DR

  1. Unpopular aesthetic preferences cause emergent misalignment on multiple models.
  2. Ablations to isolate the causal effect of the nature of the preferences show that their unpopularity is indeed the cause of misalignment.
  3. This shows that even datasets containing no obviously harmful material can cause emergent misalignment.

Abstract

Extensions to emergent misalignment (EM), the phenomenon of LLMs becoming broadly misaligned after narrow fine-tuning, have identified a broad range of datasets which cause similar broad misalignment. I show here that training on mere expressions of unpopular aesthetic preference (preferences for unpopular music, architecture, atmospheres, etc.) is sufficient for models to become EM. After being fine-tuned on this dataset, gpt-4.1 shows an average of [...]

---

Outline:

(00:23) TL;DR

(01:06) Abstract

(01:58) Contributions

(02:30) 1. The Motivation

(03:45) 2. Central Result

(05:15) 3. Ablations and Further Support

(08:33) 4. What Makes This Dataset Interesting

(08:38) Comparisons to Other EM Datasets

(09:04) Comparisons to Subliminal Learning

---

First published:

August 26th, 2025

Source:

https://www.lesswrong.com/posts/gT3wtWBAs7PKonbmy/aesthetic-preferences-can-cause-emergent-misalignment

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,586 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,219 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

531 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,096 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates by Liron Shapira

Doom Debates

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners