October 16, 2024

(Voiceover) Building on evaluation quicksand

Listen Later

16 minutes

Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand

Chapters

00:00 Building on evaluation quicksand

01:26 The causes of closed evaluation silos

06:35 The challenge facing open evaluation tools

10:47 Frontiers in evaluation

11:32 New types of synthetic data contamination

13:57 Building harder evaluations

Figures

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Interconnects

By Nathan Lambert

4.1

99 ratings

October 16, 2024

(Voiceover) Building on evaluation quicksand

Listen Later

16 minutes

Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand

Chapters

00:00 Building on evaluation quicksand

01:26 The causes of closed evaluation silos

06:35 The challenge facing open evaluation tools

10:47 Frontiers in evaluation

11:32 New types of synthetic data contamination

13:57 Building harder evaluations

Figures

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

...more

More shows like Interconnects

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

538 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,095 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

292 Listeners

Practical AI by Practical AI LLC

Practical AI

208 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

202 Listeners

Last Week in AI by Skynet Today

Last Week in AI

313 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

99 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

576 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

143 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

101 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

226 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

146 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

490 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners

Training Data by Sequoia Capital

Training Data

39 Listeners