AI: post transformers

EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm


Listen Later

The October 12, 2025 paper introduces **EssenceBench**, a novel methodology for **compressing large language model (LLM) benchmarks** while preserving evaluation fidelity. The core problem addressed is **sample redundancy** in existing benchmarks like the Open LLM Leaderboard, which is quantified through both **text-level redundancy** (semantic overlap) and **ranking-level redundancy** (correlation of model performance). The EssenceBench pipeline involves three steps: **coarse filtering** to eliminate redundant samples, **fitness-based subset selection** using a genetic algorithm (GA) to find optimal subsets, and **attribution-based sample selection** to further refine the subset for representational diversity. Experiments demonstrate that EssenceBench significantly **reduces prediction error** and **improves ranking preservation** compared to baselines like MetaBench and random selection, achieving comparable performance with much smaller subsets. The ablation studies confirm the essential role of both the filtering and attribution steps in optimizing the compressed datasets.


Source:

https://arxiv.org/pdf/2510.10457

...more
View all episodesView all episodes
Download on the App Store

AI: post transformersBy mcgrof