
Sign up to save your podcasts
Or


Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …
https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained
… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding
06:22 - WeirdML Bench
07:01 - Simple Bench Record High
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats
Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/
Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87
https://simple-bench.com/
WeirdML: https://htihle.github.io/weirdml.html
https://x.com/htihle/status/1905014058228625542
Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot
https://aistudio.google.com/prompts/new_chat
Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php
Live bench: https://livebench.ai/#/
Paper: https://arxiv.org/pdf/2406.19314
LiveCode Bench: https://livecodebench.github.io/
SWE-Verified: https://arxiv.org/pdf/2310.06770
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
By Philip - Host of AI Explained YT3.1
99 ratings
Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …
https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained
… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding
06:22 - WeirdML Bench
07:01 - Simple Bench Record High
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats
Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/
Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87
https://simple-bench.com/
WeirdML: https://htihle.github.io/weirdml.html
https://x.com/htihle/status/1905014058228625542
Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot
https://aistudio.google.com/prompts/new_chat
Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php
Live bench: https://livebench.ai/#/
Paper: https://arxiv.org/pdf/2406.19314
LiveCode Bench: https://livecodebench.github.io/
SWE-Verified: https://arxiv.org/pdf/2310.06770
Non-hype Newsletter: https://signaltonoise.beehiiv.com/

348 Listeners

201 Listeners

310 Listeners

98 Listeners

529 Listeners

512 Listeners

5,548 Listeners

142 Listeners

98 Listeners

226 Listeners

638 Listeners

106 Listeners

403 Listeners

99 Listeners

151 Listeners