Claude Opus 4.6 is here. It was built with and mostly evaluated by Claude.
Their headline pitch includes:
1M token context window (in beta) with State of the art retrieval performance. Improved abilities on a range of everyday work tasks. Model is improved. State of the art on some evaluations, including Terminal-Bench 2.0, HLE and a very strong lead in GDPval-AA. Claude Code now has an experimental feature called Agent Teams. Claude Code with Opus 4.6 has a new fast (but actually expensive) mode. Upgrades to Claude in Excel and the release of Claude in PowerPoint. Price remains $5/$25, the same as Opus 4.5, unless you go ultra fast. There is now a configurable ‘effort’ parameter with four settings. Refusals for harmless requests with rich context are down to 0.04%. Data sources are ‘all of the above,’ including the web crawler (that they insist won’t cross CAPTCHAs or password protected pages) and other public data, various non-public data sources, data from customers who opt-in to that and internally generated data. They use ‘several’ data filtering methods. Thinking mode gives better [...] ---
Outline:
(03:45) A Three Act Play
(04:57) Safety Not Guaranteed
(10:53) Pliny Can Still Jailbreak Everything
(12:48) Transparency Is Good: The 212-Page System Card
(13:53) Mostly Harmless
(17:45) Mostly Honest
(19:01) Agentic Safety
(20:27) Prompt Injection
(23:07) Key Alignment Findings
(33:48) Behavioral Evidence (6.2)
(38:40) Reward Hacking and 'Overly Agentic Actions'
(40:37) Metrics (6.2.5.2)
(42:40) All I Did It All For The GUI
(43:58) Case Studies and Targeted Evaluations Of Behaviors (6.3)
(44:19) Misrepresenting Tool Results
(45:09) Unexpected Language Switching
(46:12) The Ghost of Jones Foods
(47:54) Loss of Style Points
(48:54) White Box Model Diffing
(49:13) Model Welfare
---
https://www.lesswrong.com/posts/sWsSncqMLKyGZA9Ar/claude-opus-4-6-system-card-part-1-mundane-alignment-and
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.