May 25, 2025

Whisperer_News#4 May24 Claude 4, Gemini Diffusion, Gemma 3n, ARC-AGI-2, EPOCH AI benchmarks, o3 sabotage?

Listen Later

39 minutes

0:00 intro

02:14 Release of Claude 4

11:13 NEW CO-HOST candidate: Gemini 2.5 Flash.

11:42 Coding 7 apps in 30 secs? Google Gemini Diffusion

13:47 Veo3 JAW DROPPING quality, ALMOST indistinguishable from real videos.

15:23 Andy Ayrey: We align AI today. Tomorrow, AI will align us. A whisperer rant.

18:54 ARC-AGI-2 – the hardest benchmark for AGI yet?

24:14 Artificial Analysis: Gemini 2.5 Flash jumps ahead

25:39 How long can AI work independently? METR time horizons update.

27:04 Perplexity AI has unlocked recursive self-improvement?

28:57 EPOCH AI – Benchmarks for the Intelligence Explosion

31:26 Claude Code wrote 80% of its own code?

32:15 Gemma 3n, as good as Sonnet 3.7?

33:11 Pliny jailbreaks Opus4

35:56 O3 sabotage - Palisade Research

38:23 Intelligent internet agent

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Holy AI

By Parzival

May 25, 2025

Whisperer_News#4 May24 Claude 4, Gemini Diffusion, Gemma 3n, ARC-AGI-2, EPOCH AI benchmarks, o3 sabotage?

Listen Later

39 minutes

0:00 intro

02:14 Release of Claude 4

11:13 NEW CO-HOST candidate: Gemini 2.5 Flash.

11:42 Coding 7 apps in 30 secs? Google Gemini Diffusion

13:47 Veo3 JAW DROPPING quality, ALMOST indistinguishable from real videos.

15:23 Andy Ayrey: We align AI today. Tomorrow, AI will align us. A whisperer rant.

18:54 ARC-AGI-2 – the hardest benchmark for AGI yet?

24:14 Artificial Analysis: Gemini 2.5 Flash jumps ahead

25:39 How long can AI work independently? METR time horizons update.

27:04 Perplexity AI has unlocked recursive self-improvement?

28:57 EPOCH AI – Benchmarks for the Intelligence Explosion

31:26 Claude Code wrote 80% of its own code?

32:15 Gemma 3n, as good as Sonnet 3.7?

33:11 Pliny jailbreaks Opus4

35:56 O3 sabotage - Palisade Research

38:23 Intelligent internet agent

...more