LessWrong (30+ Karma)

“Introducing and Deprecating WoFBench” by jefftk


Listen Later

We present and formally deprecate WoFBench, a novel test that compares
the knowledge of Wings of Fire superfans to frontier AI models. The
benchmark showed initial promise as a challenging evaluation, but
unfortunately proved to be saturated on creation as AI models and
superfans produced output that was, to the extent of our ability to
score responses, statistically indistinguishable from entirely
correct.

Benchmarks are important tools for tracking the rapid advancements in
model capabilities, but they are struggling to keep up with LLM
progress: frontier models now consistently
achieve
high scores on many popular benchmarks, raising questions about their
continued ability to differentiate between models.

In response, we introduce WoFBench, an evaluation suite designed to
test recall and knowledge synthesis in the domain of Tui
T. Sutherland's Wings of Fire universe.

The superfans were identified via a careful search process, in which
all members of the lead author's household were asked to complete a
self-assessment of their knowledge of the Wings of Fire universe. The
assessment consisted of a single question, with the text "do you think
you know the Wings of Fire universe better than Gemini?" Two
superfans were identified, who we keep [...]

---

First published:

March 1st, 2026

Source:

https://www.lesswrong.com/posts/YshqDtyzgWaJxthTo/introducing-and-deprecating-wofbench

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

113,257 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

132 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,261 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

564 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,482 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners