March 05, 2025

#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

Listen Later

58 minutes

Our 201st episode with a summary and discussion of last week's big AI news!

Recorded on 03/02/2025

Join our brand new Discord here! https://discord.gg/nTyezGSKwP

Hosted by Andrey Kurenkov and guest host Sharon Zhou

Feel free to email us your questions and feedback at [email protected] and/or [email protected]

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

In this episode:

- The release of GPT-4.5 from OpenAI, Anthropic's Claude 3.7, and Grok 3 from XAI, comparing their features, costs, and capabilities.

- Discussion on new tools and applications including Sesame's new voice assistant and Google's AI coding assistant, Gemini Code Assist, highlighting their unique benefits.

- OpenAI's continued user growth despite competition, pricing models for Google's text-to-video platform, and HP acquiring and shutting down Humane's AI pin.

- Insights into new research on alignment and specification gaming in LLMs, including papers on fine-tuning causing broad misalignment and Google's multi-agent system for scientific collaboration.

Timestamps + Links:

(00:00:00) Intro / Banter

(00:01:36) News Preview

Tools & Apps

(00:02:33) OpenAI announces GPT-4.5, warns it’s not a frontier AI model

(00:07:22) Anthropic launches a new AI model that ‘thinks’ as long as you want

(00:11:14) New Grok 3 release tops LLM leaderboards

(00:16:43) Sesame is the first voice assistant I’ve ever wanted to talk to more than once

(00:18:30) Google launches a free AI coding assistant with very high usage caps

(00:20:45) Rabbit shows off the AI agent it should have launched with

(00:22:23) Mistral’s Le Chat tops 1M downloads in just 14 days

Applications & Business

(00:24:06) OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence

(00:27:37) Google’s new AI video model Veo 2 will cost 50 cents per second

(00:29:52) HP is buying Humane and shutting down the AI Pin

Projects & Open Source

(00:31:44) Microsoft launches next-gen Phi AI models.

(00:33:47) OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work

(00:37:12) SWE-Bench+: Enhanced Coding Benchmark for LLMs

Research & Advancements

(00:40:00) Towards an AI co-scientist

(00:42:52) Magma: A Foundation Model for Multimodal AI Agents

Policy & Safety

(00:47:32) Demonstrating specification gaming in reasoning models

(00:51:03) Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Last Week in AI

By Skynet Today

4.6

306306 ratings

March 05, 2025

#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

Listen Later

58 minutes

Our 201st episode with a summary and discussion of last week's big AI news!

Recorded on 03/02/2025

Join our brand new Discord here! https://discord.gg/nTyezGSKwP

Hosted by Andrey Kurenkov and guest host Sharon Zhou

Feel free to email us your questions and feedback at [email protected] and/or [email protected]

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

In this episode:

- The release of GPT-4.5 from OpenAI, Anthropic's Claude 3.7, and Grok 3 from XAI, comparing their features, costs, and capabilities.

- Discussion on new tools and applications including Sesame's new voice assistant and Google's AI coding assistant, Gemini Code Assist, highlighting their unique benefits.

- OpenAI's continued user growth despite competition, pricing models for Google's text-to-video platform, and HP acquiring and shutting down Humane's AI pin.

- Insights into new research on alignment and specification gaming in LLMs, including papers on fine-tuning causing broad misalignment and Google's multi-agent system for scientific collaboration.

Timestamps + Links:

(00:00:00) Intro / Banter

(00:01:36) News Preview

Tools & Apps

(00:02:33) OpenAI announces GPT-4.5, warns it’s not a frontier AI model

(00:07:22) Anthropic launches a new AI model that ‘thinks’ as long as you want

(00:11:14) New Grok 3 release tops LLM leaderboards

(00:16:43) Sesame is the first voice assistant I’ve ever wanted to talk to more than once

(00:18:30) Google launches a free AI coding assistant with very high usage caps

(00:20:45) Rabbit shows off the AI agent it should have launched with

(00:22:23) Mistral’s Le Chat tops 1M downloads in just 14 days

Applications & Business

(00:24:06) OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence

(00:27:37) Google’s new AI video model Veo 2 will cost 50 cents per second

(00:29:52) HP is buying Humane and shutting down the AI Pin

Projects & Open Source

(00:31:44) Microsoft launches next-gen Phi AI models.

(00:33:47) OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work

(00:37:12) SWE-Bench+: Enhanced Coding Benchmark for LLMs

Research & Advancements

(00:40:00) Towards an AI co-scientist

(00:42:52) Magma: A Foundation Model for Multimodal AI Agents

Policy & Safety

(00:47:32) Demonstrating specification gaming in reasoning models

(00:51:03) Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

...more

More shows like Last Week in AI

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

306 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

343 Listeners

Practical AI by Practical AI LLC

Practical AI

212 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

101 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

551 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

512 Listeners

The Artificial Intelligence Show by Paul Roetzer and Mike Kaput

The Artificial Intelligence Show

214 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

150 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

101 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

228 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

688 Listeners

Everyday AI Podcast – An AI and ChatGPT Podcast by Everyday AI

Everyday AI Podcast – An AI and ChatGPT Podcast

112 Listeners

A Beginner's Guide to AI by Dietmar Fischer

A Beginner's Guide to AI

54 Listeners

The Next Wave - AI and The Future of Technology by Mindstream (Hubspot Media)

The Next Wave - AI and The Future of Technology

55 Listeners