The Glitchatorio

What We Want


Listen Later

Large language models are trained to respond to our preferences. It sounds logical enough in theory, but it turns out to spiral in strange and unexpected directions in practice, from AI-induced psychosis in humans to manipulation and power-seeking on the part of the AIs.

In this episode, hear from Ihor Kendiukhov from SPAR (Supervised Program for Alignment Research) about why he changed his career to work on AI safety, and some of the current approaches in understanding what it is that LLMs might want themselves.

...more
View all episodesView all episodes
Download on the App Store

The GlitchatorioBy Witch of Glitch