Share Dreaming or Scheming?

Copy link

October 27, 2025

Dreaming or Scheming?

30 minutes

Since AI models are built on artificial neural networks, what parallels can we draw with human brain "wiring"?

In this episode, we hear from neuroscientist and psychology researcher Scott Blain about the pitfalls of pattern recognition, as well as that grey area where agreeableness shades into sycophancy.

We then dig into the big unknowns about AI self-awareness and its capacity to deceive or manipulate humans. For more details about the "blackmail experiment" we discuss, see the paper from Anthropic called "Agentic Misalignment: How LLMs could be insider threats". Finally, if you're not familiar with the following terms, here are some quick definitions:

RLHF - reinforcement learning from human feedback. This is a technique where people "teach" AI models which answers it should provide by rewarding the correct ones.
Mechanistic interpretability - a kind of reverse engineering of AI models that seeks to understand their outputs by investigating the activity of their neural networks.

...more

View all episodes

By Witch of Glitch