January 08, 2022

10/15/20 #1 Marco Tulio Ribeiro - Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

1 hour

Marco Tulio Ribeiro on "Beyond Accuracy: Behavioral Testing of NLP Models with CheckList"

We will present CheckList, a task-agnostic methodology and tool for testing NLP models inspired by principles of behavioral testing in software engineering. We will show a lot of fun bugs we discovered with CheckList, both in commercial models (Microsoft, Amazon, Google) and research models (BERT, RoBERTA for sentiment analysis, QQP, SQuAD). We'll also present comparisons between CheckList and the status quo, in a case study at Microsoft and a user study with researchers and engineers. We show that CheckList is a really helpful process and tool for testing and finding bugs in NLP models, both for practitioners and researchers.

...more

View all episodes

By Dan Fu, Karan Goel, Fiodar Kazhamakia, Piero Molino, Matei Zaharia, Chris Ré

77 ratings

January 08, 2022

10/15/20 #1 Marco Tulio Ribeiro - Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

1 hour

Marco Tulio Ribeiro on "Beyond Accuracy: Behavioral Testing of NLP Models with CheckList"

...more

Share 10/15/20 #1 Marco Tulio Ribeiro - Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

Sign up to save your podcasts

10/15/20 #1 Marco Tulio Ribeiro - Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

10/15/20 #1 Marco Tulio Ribeiro - Beyond Accuracy: Behavioral Testing of NLP Models with CheckList