Data Science at Home

Episode 24: How to handle imbalanced datasets


Listen Later

In machine learning and data science in general it is very common to deal at some point with imbalanced datasets and class distributions. This is the typical case where the number of observations that belong to one class is significantly lower than those belonging to the other classes.  Actually this happens all the time, in several domains, from finance, to healthcare to social media, just to name a few I have personally worked with.

Think about a bank detecting fraudulent transactions among millions or billions of daily operations, or equivalently in healthcare for the identification of rare disorders.
In genetics but also with clinical lab tests this is a normal scenario, in which, fortunately there are very few patients affected by a disorder and therefore very few cases wrt the large pool of healthy patients (or not affected).
There is no algorithm that can take into account the class distribution or the amount of observations in each class, if it is not explicitly designed to handle such situations.
In this episode I speak about some effective techniques to handle imbalanced datasets, advising the right method, or the most appropriate one to the right dataset or problem.

In this episode I explain how to deal with such common and challenging scenarios.

...more
View all episodesView all episodes
Download on the App Store

Data Science at HomeBy Francesco Gadaleta

  • 4.2
  • 4.2
  • 4.2
  • 4.2
  • 4.2

4.2

72 ratings


More shows like Data Science at Home

View all
Radiolab by WNYC Studios

Radiolab

43,911 Listeners

TED Talks Daily by TED

TED Talks Daily

11,133 Listeners

Learning English Conversations by BBC Radio

Learning English Conversations

1,069 Listeners

Stuff You Should Know by iHeartPodcasts

Stuff You Should Know

77,573 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

482 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

593 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

202 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

298 Listeners

Learning English from the News by BBC Radio

Learning English from the News

261 Listeners

DataFramed by DataCamp

DataFramed

267 Listeners

Practical AI by Practical AI LLC

Practical AI

189 Listeners

The Intelligence from The Economist by The Economist

The Intelligence from The Economist

2,528 Listeners

Raport o stanie świata Dariusza Rosiaka by Dariusz Rosiak

Raport o stanie świata Dariusza Rosiaka

35 Listeners

The Ancients by History Hit

The Ancients

2,979 Listeners

Hard Fork by The New York Times

Hard Fork

5,426 Listeners