O'Reilly Data Show Podcast

Why it’s hard to design fair machine learning models


Listen Later

In this episode of the Data Show, I spoke with Sharad Goel, assistant professor at Stanford, and his student Sam Corbett-Davies. They recently wrote a survey paper, “A Critical Review of Fair Machine Learning,” where they carefully examined the standard statistical tools used to check for fairness in machine learning models. It turns out that each of the standard approaches (anti-classification, classification parity, and calibration) has limitations, and their paper is a must-read tour through recent research in designing fair algorithms. We talked about their key findings, and, most importantly, I pressed them to list a few best practices that analysts and industrial data scientists might want to consider.
Here are some highlights from our conversation:
Calibration and other standard metrics
Sam Corbett-Davies: The problem with many of the standard metrics is that they fail to take into account how different groups might have different distributions of risk. In particular, if there are people who are very low risk or very high risk, then it can throw off these measures in a way that doesn’t actually change what the fair decision should be. … The upshot is that if you end up enforcing or trying to enforce one of these measures, if you try to equalize false positive rates, or you try to equalize some other classification parity metric, you can end up hurting both the group you’re trying to protect and any other groups for which you might be changing the policy.
… A layman’s definition of calibration would be, if an algorithm gives a risk score—maybe it gives a score from one to 10, and one is very low risk and 10 is very high risk—calibration says the scores should mean the same thing for different groups (where the groups are defined based on some protected variable like gender, age, or race). We basically say in our paper that calibration is necessary for fairness, but it’s not good enough. Just because your scores are calibrated doesn’t mean you aren’t doing something funny that could be harming certain groups.
The need to interrogate data
Sharad Goel: One way to operationalize this is if you have a set of reasonable measures to be your label, you can see how much your algorithm changes if you use different measures. If your algorithm is changing a lot using these different measures, then you really have to worry about determining the right measure. What is the right thing to predict. If it’s the case that under a variety of reasonable measures everything looks kind of stable, maybe it’s less of an issue. This is very hard to carry out in practice, but I do think it’s one of the most important things to understand and to be aware of when designing these types of algorithms.
… There are a lot of subtleties to these different types of metrics that are important to be aware of when designing these algorithms in an equitable way. … But fundamentally, these are hard problems. It’s not particularly surprising that we don’t have an algorithm to help us make all of these algorithms fair. … What is most important is that we really interrogate the data.
Related resources:
“Managing risk in machine learning models”: Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain.
“We need to build machine learning tools to augment machine learning engineers”
“Case studies in data ethics”
“Haunted by data”: Maciej Ceglowski makes the case for adopting enforceable limits for data storage.
...more
View all episodesView all episodes
Download on the App Store

O'Reilly Data Show PodcastBy O'Reilly Media

  • 4
  • 4
  • 4
  • 4
  • 4

4

63 ratings


More shows like O'Reilly Data Show Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

283 Listeners

O'Reilly Radar Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Radar Podcast - O'Reilly Media Podcast

36 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

482 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

592 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

623 Listeners

O'Reilly Design Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Design Podcast - O'Reilly Media Podcast

8 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

446 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

202 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

297 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

323 Listeners

Machine Learning Guide by OCDevel

Machine Learning Guide

764 Listeners

AI Today Podcast by AI & Data Today

AI Today Podcast

146 Listeners

DataFramed by DataCamp

DataFramed

267 Listeners

Practical AI by Practical AI LLC

Practical AI

192 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

197 Listeners

Last Week in AI by Skynet Today

Last Week in AI

287 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

199 Listeners