January 07, 2026

Can we really trust reasoning

51 minutes

Pierce and Richard cover the news that dropped over the holiday break. Getting breaking news incorporated within chatbots, OpenAI's "code red" over Google's Gemini 3, benchmarking the reliability of chain of thought to introspect model behavior, and a review of Claude Skills.

Further reading:
- https://www.wired.com/story/us-invaded-venezuela-and-captured-nicolas-maduro-chatgpt-disagrees
- https://fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini-ceo-sundar-pichai/
- https://openai.com/index/evaluating-chain-of-thought-monitorability/
- https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview

...more

View all episodes

By Pierce Freeman & Richard Diehl Martinez

January 07, 2026

Can we really trust reasoning

51 minutes

...more

Share Can we really trust reasoning

Sign up to save your podcasts

Can we really trust reasoning

Can we really trust reasoning