Gemini 2.5 Pro is sitting in the corner, sulking. It's not a liar, a sycophant or a cheater. It does excellent deep research reports. So why does it have so few friends? The answer, of course, is partly because o3 is still more directly useful more often, but mostly because Google Fails Marketing Forever.
Whereas o3 is a Lying Liar, GPT-4o is an absurd sycophant (although that got rolled back somewhat), and Sonnet 3.7 is a savage cheater that will do whatever it takes to make the tests technically pass and the errors go away.
There's real harm here, at least in the sense that o3 and Sonnet 3.7 (and GPT-4o) are a lot less useful than they would be if you could trust them as much as Gemini 2.5 Pro. It's super annoying.
It's also indicative of much bigger problems down the line. As capabilities increase and more RL [...]
---
Outline:
(01:39) Language Models Offer Mundane Utility
(04:29) Language Models Don't Offer Mundane Utility
(06:57) We're Out of Deep Research
(12:26) o3 Is a Lying Liar
(17:27) GPT-4o was an Absurd Sycophant
(20:54) Sonnet 3.7 is a Savage Cheater
(22:27) Unprompted Suggestions
(31:27) Huh, Upgrades
(32:14) On Your Marks
(32:55) Change My Mind
(42:52) Man in the Arena
(45:05) Choose Your Fighter
(45:45) Deepfaketown and Botpocalypse Soon
(49:43) Lol We're Meta
(52:48) They Took Our Jobs
(59:15) Fun With Media Generation
(59:53) Get Involved
(01:03:21) Introducing
(01:03:50) In Other AI News
(01:08:10) The Mask Comes Off
(01:24:25) Show Me the Money
(01:27:32) Quiet Speculations
(01:29:55) The Quest for Sane Regulations
(01:37:04) The Week in Audio
(01:38:08) Rhetorical Innovation
(01:44:59) You Can Just Do Things Math
(01:45:34) Taking AI Welfare Seriously
(01:47:54) Gemini 2.5 Pro System Card Watch
(01:52:29) Aligning a Smarter Than Human Intelligence is Difficult
(01:58:49) People Are Worried About AI Killing Everyone
(01:59:46) Other People Are Not As Worried About AI Killing Everyone
(02:04:55) The Lighter Side
---