August 16, 2025

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

25 minutes

🤗 Upvotes: 33 | cs.CL, cs.AI

Authors:

Mo Yu, Tsz Ting Chung, Chulun Zhou, Tong Li, Rui Lu, Jiangnan Li, Liyan Xu, Haoshu Lu, Ning Zhang, Jing Li, Jie Zhou

Title:

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Arxiv:

http://arxiv.org/abs/2508.09848v2

Abstract:

We introduce PRELUDE, a benchmark for evaluating long-context understanding through the task of determining whether a character's prequel story is consistent with the canonical narrative of the original book. Our task poses a stronger demand for global comprehension and deep reasoning than existing benchmarks -- as the prequels are not part of the original story, assessing their plausibility typically requires searching and integrating information that is only indirectly related. Empirically, 88% of instances require evidence from multiple parts of the narrative. Experimental results highlight the challenge of our task: in-context learning, RAG and in-domain training with state-of-the-art LLMs, and commercial DeepResearch services, lag behind humans by >15%. A further human study reveals that models often produce correct answers with flawed reasoning, leading to an over 30% gap in reasoning accuracy compared to humans. These findings underscore the substantial room for improvement in long-context understanding and reasoning.

...more

View all episodes

By Jingwen Liang, Gengyu Wang

August 16, 2025

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

25 minutes

🤗 Upvotes: 33 | cs.CL, cs.AI

Authors:

Mo Yu, Tsz Ting Chung, Chulun Zhou, Tong Li, Rui Lu, Jiangnan Li, Liyan Xu, Haoshu Lu, Ning Zhang, Jing Li, Jie Zhou

Title:

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Arxiv:

http://arxiv.org/abs/2508.09848v2

Abstract:

...more

Share PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Sign up to save your podcasts

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts