
Sign up to save your podcasts
Or


Anthropic has released an upgraded Claude Sonnet 3.5, and the new Claude Haiku 3.5.
They claim across the board improvements to Sonnet, and it has a new rather huge ability accessible via the API: Computer use. Nothing could possibly go wrong.
Claude Haiku 3.5 is also claimed as a major step forward for smaller models. They are saying that on many evaluations it has now caught up to Opus 3.
Missing from this chart is o1, which is in some ways not a fair comparison since it uses so much inference compute, but does greatly outperform everything here on the AIME and some other tasks.
METR: We conducted an independent pre-deployment assessment of the updated Claude 3.5 Sonnet model and will share our report soon.
We only have very early feedback so far, so it's hard to tell how much what I will be [...]
---
Outline:
(01:32) OK, Computer
(05:16) What Could Possibly Go Wrong
(11:33) The Quest for Lunch
(14:07) Aside: Someone Please Hire The Guy Who Names Playstations
(17:15) Coding
(18:10) Startups Get Their Periodic Reminder
(19:36) Live From Janus World
(26:19) Forgot about Opus
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By zvi5
22 ratings
Anthropic has released an upgraded Claude Sonnet 3.5, and the new Claude Haiku 3.5.
They claim across the board improvements to Sonnet, and it has a new rather huge ability accessible via the API: Computer use. Nothing could possibly go wrong.
Claude Haiku 3.5 is also claimed as a major step forward for smaller models. They are saying that on many evaluations it has now caught up to Opus 3.
Missing from this chart is o1, which is in some ways not a fair comparison since it uses so much inference compute, but does greatly outperform everything here on the AIME and some other tasks.
METR: We conducted an independent pre-deployment assessment of the updated Claude 3.5 Sonnet model and will share our report soon.
We only have very early feedback so far, so it's hard to tell how much what I will be [...]
---
Outline:
(01:32) OK, Computer
(05:16) What Could Possibly Go Wrong
(11:33) The Quest for Lunch
(14:07) Aside: Someone Please Hire The Guy Who Names Playstations
(17:15) Coding
(18:10) Startups Get Their Periodic Reminder
(19:36) Live From Janus World
(26:19) Forgot about Opus
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,335 Listeners

2,455 Listeners

1,096 Listeners

109 Listeners

289 Listeners

94 Listeners

522 Listeners

5,522 Listeners

141 Listeners

13 Listeners

133 Listeners

151 Listeners

472 Listeners

0 Listeners

134 Listeners