The Nonlinear Library

LW - Broken Benchmark: MMLU by awg


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Broken Benchmark: MMLU, published by awg on August 29, 2023 on LessWrong.
Phillip over at the AI Explained channel has been running some experiments on his SmartGPT framework against the MMLU benchmark and discovered a not-insignificant amount of issues with the problem set.
Among them:
Crucial context missing from questions (apparently copy-paste errors?)
Ambiguous sets of answers
Wrong sets of answers
He highlights a growing need for a proper benchmarking organization that can research and create accurate, robust, sensible benchmarking suites for evaluating SOTA models.
I found this video to be super interesting and the findings to be very important, so I wanted to spread this here.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings