February 23, 2023

Sam Bowman on benchmarking and AI alignment

1 hour 26 minutes

Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment.

Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/

Sam's website

Sam on Twitter

NYU Linguistics

NYU Data Science

NYU Computer Science

Anthropic

SNLI paper: A large annotated corpus for learning natural language inference

SNLI leaderboard

FraCaS

SICK

A SICK cure for the evaluation of compositional distributional semantic models

SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment

RTE Knowledge Resources

Richard Socher

Chris Manning

Andrew Ng

Ray Kurtzweil

SQuAD

Gabor Angeli

Adina Williams

Adina Williams podcast episode

MultiNLI paper: A broad-coverage challenge corpus for sentence understanding through inference

MultiNLI leaderboards

Twitter discussion of LLMs and negation

GLUE

SuperGLUE

DecaNLP

GPT-3 paper: Language Models are Few-Shot Learners

FLAN

Winograd schema challenges

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

JSALT: General-Purpose Sentence Representation Learning

Ellie Pavlick

Ellie Pavlick podcast episode

Tal Linzen

Ian Tenney

Dipanjan Das

Yoav Goldberg

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

Big Bench

Upwork

Surge AI

Dynabench

Douwe Kiela

Douwe Kiela podcast episode

Ethan Perez

NYU Alignment Research Group

Eliezer Shlomo Yudkowsky

Alignment Research Center

Redwood Research

Percy Liang podcast episode

Richard Socher podcast episode

...more

View all episodes

By Chris Potts

66 ratings

February 23, 2023

Sam Bowman on benchmarking and AI alignment

1 hour 26 minutes

Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment.

Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/

Sam's website

Sam on Twitter

NYU Linguistics

NYU Data Science

NYU Computer Science

Anthropic

SNLI paper: A large annotated corpus for learning natural language inference

SNLI leaderboard

FraCaS

SICK

A SICK cure for the evaluation of compositional distributional semantic models

SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment

RTE Knowledge Resources

Richard Socher

Chris Manning

Andrew Ng

Ray Kurtzweil

SQuAD

Gabor Angeli

Adina Williams

Adina Williams podcast episode

MultiNLI paper: A broad-coverage challenge corpus for sentence understanding through inference

MultiNLI leaderboards

Twitter discussion of LLMs and negation

GLUE

SuperGLUE

DecaNLP

GPT-3 paper: Language Models are Few-Shot Learners

FLAN

Winograd schema challenges

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

JSALT: General-Purpose Sentence Representation Learning

Ellie Pavlick

Ellie Pavlick podcast episode

Tal Linzen

Ian Tenney

Dipanjan Das

Yoav Goldberg

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

Big Bench

Upwork

Surge AI

Dynabench

Douwe Kiela

Douwe Kiela podcast episode

Ethan Perez

NYU Alignment Research Group

Eliezer Shlomo Yudkowsky

Alignment Research Center

Redwood Research

Percy Liang podcast episode

Richard Socher podcast episode

...more

More shows like CS224U

View all

The Ezra Klein Show

15,335 Listeners

Share Sam Bowman on benchmarking and AI alignment

Sign up to save your podcasts

Sam Bowman on benchmarking and AI alignment

Sam Bowman on benchmarking and AI alignment

More shows like CS224U

The Ezra Klein Show