O'Reilly Data Show Podcast

The importance of transparency and user control in machine learning


Listen Later

In this episode of the Data Show, I spoke with Guillaume Chaslot, an ex-YouTube engineer and founder of AlgoTransparency, an organization dedicated to helping the public understand the profound impact algorithms have on our lives. We live in an age when many of our interactions with companies and services are governed by algorithms. At a time when their impact continues to grow, there are many settings where these algorithms are far from transparent. There is growing awareness about the vast amounts of data companies are collecting on their users and customers, and people are starting to demand control over their data. A similar conversation is starting to happen about algorithms—users are wanting more control over what these models optimize for and an understanding of how they work.
I first came across Chaslot through a series of articles about the power and impact of YouTube on politics and society. Many of the articles I read relied on data and analysis supplied by Chaslot. We talked about his work trying to decipher how YouTube’s recommendation system works, filter bubbles, transparency in machine learning, and data privacy.
Here are some highlights from our conversation:
Why YouTube’s impact is less understood
My theory why people completely overlooked YouTube is because on Facebook and Twitter, if one of your friends posts something strange, you’ll see it. Even if you have 1,000 friends, if one of them posts something really disturbing, you see it, so you’re more aware of the problem. Whereas on YouTube, some people binge watch some very weird things that could be propaganda, but we won’t know about it because we don’t see what other people see. So, YouTube is like a TV channel that doesn’t show the same thing to everybody and when you ask YouTube, “What did you show to other people?” YouTube says, ‘I don’t know, I don’t remember, I don’t want to tell you.’
Downsides of optimizing only for watch time
When I was working on the YouTube algorithm and our goal was to optimize watch time, we were trying to make sure that the algorithm kept people online the longest. But what I realized was that we were so focused on this target of watch time that we were forgetting a lot of important things and we were seeing some very strange behavior of the algorithm. Each time we were seeing this strange behavior, we just blamed it on the user. It shows violent videos; it must be because users are violent, so it’s not our fault; the algorithm is just a mirror of human society. But if I believe the algorithm is a mirror of human society, I think it’s also not a flat mirror; it’s a mirror that emphasizes some aspects of life and makes some other aspects overlooked.
… The algorithm that is behind YouTube and the Facebook news feeds are very complex, deep learning systems that will take a lot into account, including user sessions, what they’ve watched. It will try to find the right content to show to users to get them to stay online the longest and interact as much as possible with the content. So, this can seem neutral at first, but it might not be neutral. For instance, if you have content that says ‘The media is lying,’ whether it’s on Facebook or on YouTube, what will happen is that this content will naturally, if it manages to convince the user that the media is lying, the content will be very efficient at keeping the user online because the user won’t go to other media and will spend more time on YouTube and more time on Facebook.
… In my personal opinion, the current goal of maximizing watch time means that any content that is really good at captivating your attention for a long time will perform really well. This means extreme content will actually perform really well. But say you had another goal—for instance, the goal to maximize likes and dislikes, or another system of rating like when you would be asked some question like, ‘Did yo
...more
View all episodesView all episodes
Download on the App Store

O'Reilly Data Show PodcastBy O'Reilly Media

  • 4
  • 4
  • 4
  • 4
  • 4

4

63 ratings


More shows like O'Reilly Data Show Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

283 Listeners

O'Reilly Radar Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Radar Podcast - O'Reilly Media Podcast

36 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

482 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

592 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

623 Listeners

O'Reilly Design Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Design Podcast - O'Reilly Media Podcast

8 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

446 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

202 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

297 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

323 Listeners

Machine Learning Guide by OCDevel

Machine Learning Guide

764 Listeners

AI Today Podcast by AI & Data Today

AI Today Podcast

146 Listeners

DataFramed by DataCamp

DataFramed

267 Listeners

Practical AI by Practical AI LLC

Practical AI

192 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

197 Listeners

Last Week in AI by Skynet Today

Last Week in AI

287 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

199 Listeners