Share LW - Broken Benchmark: MMLU by awg

Copy link

August 29, 2023

LW - Broken Benchmark: MMLU by awg

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Broken Benchmark: MMLU, published by awg on August 29, 2023 on LessWrong.

Phillip over at the AI Explained channel has been running some experiments on his SmartGPT framework against the MMLU benchmark and discovered a not-insignificant amount of issues with the problem set.

Among them:

Crucial context missing from questions (apparently copy-paste errors?)

Ambiguous sets of answers

Wrong sets of answers

He highlights a growing need for a proper benchmarking organization that can research and create accurate, robust, sensible benchmarking suites for evaluating SOTA models.

I found this video to be super interesting and the findings to be very important, so I wanted to spread this here.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings