November 06, 2025

20251106 - Oxford pretends AI benchmarks are science, not marketing

6 minutes

How could all these benchmarks be fake, it's a mystery

https://pivot-to-ai.com/2025/11/06/oxford-pretends-ai-benchmarks-are-science-not-marketing/ - text

Patreon: https://www.patreon.com/davidgerard
Ko-Fi: https://ko-fi.com/A1529D5
Buy me nice things: https://www.amazon.co.uk/hz/wishlist/ls/3Q8VZW46J6DM6
Get an extremely cool Pivot to AI shirt or mug: https://pivot-to-ai.redbubble.com

Send in your story tips: [email protected]

Sources:

Measuring what Matters: Construct Validity in Large Language Model Benchmarks (press release) https://oxrml.com/measuring-what-matters/
Measuring what Matters: Construct Validity in Large Language Model Benchmarks (PDF) https://openreview.net/pdf?id=mdA5lVvNcU

Previously on Pivot to AI:

OpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questions https://pivot-to-ai.com/2025/01/20/openai-o3-beats-frontiermath-because-openai-funded-the-test-and-had-access-to-questions/
AI benchmarks are self-promoting trash — but regulators keep using them https://pivot-to-ai.com/2025/02/25/ai-benchmarks-are-self-promoting-trash-but-regulators-keep-using-them/
The finance press finally starts talking about the 'AI bubble' https://pivot-to-ai.com/2025/09/28/the-finance-press-finally-starts-talking-about-the-ai-bubble/
video: https://www.youtube.com/watch?v=AgR1TCllRgc&list=UU9rJrMVgcXTfa8xuMnbhAEA

Full Pivot to AI playlist: https://www.youtube.com/playlist?list=UU9rJrMVgcXTfa8xuMnbhAEA

...more

By David Gerard