October 09, 2017

Episode 24: How to handle imbalanced datasets

Listen Later

21 minutes

In machine learning and data science in general it is very common to deal at some point with imbalanced datasets and class distributions. This is the typical case where the number of observations that belong to one class is significantly lower than those belonging to the other classes. Actually this happens all the time, in several domains, from finance, to healthcare to social media, just to name a few I have personally worked with.

Think about a bank detecting fraudulent transactions among millions or billions of daily operations, or equivalently in healthcare for the identification of rare disorders.

In genetics but also with clinical lab tests this is a normal scenario, in which, fortunately there are very few patients affected by a disorder and therefore very few cases wrt the large pool of healthy patients (or not affected).

There is no algorithm that can take into account the class distribution or the amount of observations in each class, if it is not explicitly designed to handle such situations.

In this episode I speak about some effective techniques to handle imbalanced datasets, advising the right method, or the most appropriate one to the right dataset or problem.

In this episode I explain how to deal with such common and challenging scenarios.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Data Science at Home

By Francesco Gadaleta

4.2

7272 ratings

October 09, 2017

Episode 24: How to handle imbalanced datasets

Listen Later

21 minutes

In machine learning and data science in general it is very common to deal at some point with imbalanced datasets and class distributions. This is the typical case where the number of observations that belong to one class is significantly lower than those belonging to the other classes. Actually this happens all the time, in several domains, from finance, to healthcare to social media, just to name a few I have personally worked with.

Think about a bank detecting fraudulent transactions among millions or billions of daily operations, or equivalently in healthcare for the identification of rare disorders.

In genetics but also with clinical lab tests this is a normal scenario, in which, fortunately there are very few patients affected by a disorder and therefore very few cases wrt the large pool of healthy patients (or not affected).

There is no algorithm that can take into account the class distribution or the amount of observations in each class, if it is not explicitly designed to handle such situations.

In this episode I speak about some effective techniques to handle imbalanced datasets, advising the right method, or the most appropriate one to the right dataset or problem.

In this episode I explain how to deal with such common and challenging scenarios.

...more

More shows like Data Science at Home

On Point with Meghna Chakrabarti by WBUR

On Point with Meghna Chakrabarti

4,022 Listeners

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,380 Listeners

Nature Podcast by Springer Nature Limited

Nature Podcast

756 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

Science Vs by Spotify Studios

Science Vs

12,130 Listeners

Science Friday by Science Friday and WNYC Studios

Science Friday

6,467 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

306 Listeners

The Daily by The New York Times

The Daily

113,121 Listeners

Up First from NPR by NPR

Up First from NPR

56,944 Listeners

The Atlantic Interview by The Atlantic

The Atlantic Interview

14 Listeners

Modern Wisdom by Chris Williamson

Modern Wisdom

4,025 Listeners

The Peter Attia Drive by Peter Attia, MD

The Peter Attia Drive

8,043 Listeners

Practical AI by Practical AI LLC

Practical AI

212 Listeners

Consider This from NPR by NPR

Consider This from NPR

6,462 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,525 Listeners