Data Skeptic

Annotator Bias


Listen Later

The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on. Folk wisdom estimates used to be around 100k documents were required for effective training. The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora.

Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use the transfer learning to get a head start and focus on learning the nuances of the language specifically relevant to the task at hand. Thus, small specialized corpora are both useful and practical to create.

In this episode, Kyle speaks with Mor Geva, lead author on the recent paper Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets, which explores some unintended consequences of the typical procedure followed for generating corpora.

Source code for the paper available here: https://github.com/mega002/annotator_bias

...more
View all episodesView all episodes
Download on the App Store

Data SkepticBy Kyle Polich

  • 4.4
  • 4.4
  • 4.4
  • 4.4
  • 4.4

4.4

475 ratings


More shows like Data Skeptic

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

290 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

622 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

584 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

301 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

333 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

228 Listeners

Practical AI by Practical AI LLC

Practical AI

206 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

203 Listeners

Last Week in AI by Skynet Today

Last Week in AI

306 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

96 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

519 Listeners

MIT Technology Review Narrated by MIT Technology Review

MIT Technology Review Narrated

261 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

132 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

228 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

617 Listeners