Linear Digressions

Data Contamination


Listen Later

Supervised machine learning assumes that the features and labels used for building a classifier are isolated from each other--basically, that you can't cheat by peeking. Turns out this can be easier said than done. In this episode, we'll talk about the many (and diverse!) cases where label information contaminates features, ruining data science competitions along the way.
Relevant links:
https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_data_mining_Formulation_detection_and_avoidance/links/54418bb80cf2a6a049a5a0ca.pdf
...more
View all episodesView all episodes
Download on the App Store

Linear DigressionsBy Ben Jaffe and Katie Malone

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

353 ratings


More shows like Linear Digressions

View all
99% Invisible by Roman Mars

99% Invisible

26,138 Listeners

You Are Not So Smart by You Are Not So Smart

You Are Not So Smart

1,712 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

298 Listeners

The Daily by The New York Times

The Daily

111,397 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,207 Listeners

WSJ's Take On the Week by The Wall Street Journal

WSJ's Take On the Week

131 Listeners

The Severance Podcast with Ben Stiller & Adam Scott by Audacy, Red Hour, Great Scott

The Severance Podcast with Ben Stiller & Adam Scott

2,162 Listeners