The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on.  Folk wisdom estimates used to be around 100k documents were required for effective training.  The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora. Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use the transfer learning to get a head start and focus on learning the nuances of the language specifically relevant to the task at hand.  Thus, small specialized corpora are both useful and practical to create. In this episode, Kyle speaks with Mor Geva, lead author on the recent paper Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets, which explores some unintended consequences of the typical procedure followed for generating corpora. Source code for the paper available here: https://github.com/mega002/annotator_bias  

Annotator Bias

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Technology

Tech News

Podcasting

Gadgets

Software How-To

Science

Mathematics

Many researchers and students have painstakingly labeled precise details about the body positions of the creatures they study. Can AI be used for this labeling? Of course it can! Today's episode discusses Social LEAP Estimates Animal Poses (SLEAP), a software solution to train AI to perform this tedious but important labeling work.

Pose Tracking

Our guest in this episode is Sebastien Motsch, an assistant professor at Arizona State University, working in the School of Mathematical and Statistical Science. He works on modeling self-organized biological systems to understand how complex patterns emerge.

Modeling Group Behavior

Our guest in this episode is Ryan Hanscom. Ryan is a Ph.D. candidate in a joint doctoral evolution program at San Diego State University and the University of California, Riverside. He is a terrestrial ecologist with a focus on herpetology and mammalogy.  Ryan discussed how the behavior of rattlesnakes is studied in the natural world, particularly with an increase in temperature.

Advances in Data Loggers

We are joined by Hank Schlinger, a professor of psychology at California State University, Los Angeles. His research revolves around theoretical issues in psychology and behavioral analysis.  Hank establishes that words have references and questions the reference for intelligence. He discussed how intelligence can be observed in animals. He also discussed how intelligence is measured in a given context.

What You Know About Intelligence is Wrong (fixed)

On today’s episode, we are joined by Aimee Dunlap. Aimee is an assistant professor at the University of Missouri–St. Louis and the interim director at the Whitney R. Harris World Ecology Center. Aimee discussed how animals perceive information and what they use it for. She discussed the connection between their environment and learning for decision-making. She also discussed the costs required for learning and factors that affect animal learning.

Annotator Bias

Download our free app to listen on your phone