Linear Digressions

Data Shapley


Listen Later

We talk often about which features in a dataset are most important, but recently a new paper has started making the rounds that turns the idea of importance on its head: Data Shapley is an algorithm for thinking about which examples in a dataset are most important. It makes a lot of intuitive sense: data that’s just repeating examples that you’ve already seen, or that’s noisy or an extreme outlier, might not be that valuable for using to train a machine learning model. But some data is very valuable, it’s disproportionately useful for the algorithm figuring out what the most important trends are, and Data Shapley is explicitly designed to help machine learning researchers spend their time understanding which data points are most valuable and why.
Relevant links:
http://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf
https://blog.acolyer.org/2019/07/15/data-shapley/
...more
View all episodesView all episodes
Download on the App Store

Linear DigressionsBy Ben Jaffe and Katie Malone

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

353 ratings


More shows like Linear Digressions

View all
99% Invisible by Roman Mars

99% Invisible

26,207 Listeners

You Are Not So Smart by You Are Not So Smart

You Are Not So Smart

1,707 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

293 Listeners

The Daily by The New York Times

The Daily

111,530 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,263 Listeners

WSJ's Take On the Week by The Wall Street Journal

WSJ's Take On the Week

124 Listeners

The Severance Podcast with Ben Stiller & Adam Scott by Audacy, Red Hour, Great Scott

The Severance Podcast with Ben Stiller & Adam Scott

2,336 Listeners