The benchmarks largely say yes. Certainly it is an actual attempt at a similar style of product, and is if anything more capable of solving AIME questions, and the way it shows its Chain of Thought is super cool. Beyond that, alas, we don’t have enough reports in from people using it. So it's still too soon to tell. If it is fully legit, the implications seems important.
The other half of events was about policy under the Trump administration. What should the federal government do? We [...]
---
Outline:
(01:31) Language Models Offer Mundane Utility
(05:37) Language Models Don’t Offer Mundane Utility
(08:14) Claude Sonnet 3.5.1 Evaluation
(11:09) Deepfaketown and Botpocalypse Soon
(11:57) Fun With Image Generation
(12:08) O-(There are)-Two
(15:25) The Last Mile
(22:52) They Took Our Jobs
(29:53) We Barely Do Our Jobs Anyway
(35:52) The Art of the Jailbreak
(39:20) Get Involved
(39:43) The Mask Comes Off
(40:36) Richard Ngo on Real Power and Governance Futures
(44:28) Introducing
(46:51) In Other AI News
(52:16) Quiet Speculations
(59:33) The Quest for Sane Regulations
(01:02:35) The Quest for Insane Regulations
(01:12:42) Pick Up the Phone
(01:13:21) Worthwhile Dean Ball Initiative
(01:29:18) The Week in Audio
(01:31:20) Rhetorical Innovation
(01:37:15) Pick Up the Phone
(01:38:32) Aligning a Smarter Than Human Intelligence is Difficult
(01:43:29) People Are Worried About AI Killing Everyone
(01:46:03) The Lighter Side
The original text contained 8 images which were described by AI.
---