LessWrong posts by zvi

“Claude Sonnet 3.5.1 and Haiku 3.5” by Zvi


Listen Later

Anthropic has released an upgraded Claude Sonnet 3.5, and the new Claude Haiku 3.5.

They claim across the board improvements to Sonnet, and it has a new rather huge ability accessible via the API: Computer use. Nothing could possibly go wrong.

Claude Haiku 3.5 is also claimed as a major step forward for smaller models. They are saying that on many evaluations it has now caught up to Opus 3.

Missing from this chart is o1, which is in some ways not a fair comparison since it uses so much inference compute, but does greatly outperform everything here on the AIME and some other tasks.

METR: We conducted an independent pre-deployment assessment of the updated Claude 3.5 Sonnet model and will share our report soon.

We only have very early feedback so far, so it's hard to tell how much what I will be [...]

---

Outline:

(01:32) OK, Computer

(05:16) What Could Possibly Go Wrong

(11:33) The Quest for Lunch

(14:07) Aside: Someone Please Hire The Guy Who Names Playstations

(17:15) Coding

(18:10) Startups Get Their Periodic Reminder

(19:36) Live From Janus World

(26:19) Forgot about Opus

---

First published:

October 24th, 2024

Source:

https://www.lesswrong.com/posts/jZigzT3GLZoFTATG4/claude-sonnet-3-5-1-and-haiku-3-5

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong posts by zviBy zvi

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like LessWrong posts by zvi

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,369 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,425 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,093 Listeners

Future of Life Institute Podcast by Future of Life Institute

Future of Life Institute Podcast

107 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

288 Listeners

Politix by Politix

Politix

95 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

74 Listeners

Hard Fork by The New York Times

Hard Fork

5,470 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

139 Listeners

LessWrong (Curated & Popular) by LessWrong

LessWrong (Curated & Popular)

13 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

130 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

153 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

507 Listeners

LessWrong (30+ Karma) by LessWrong

LessWrong (30+ Karma)

0 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

133 Listeners