AI Post Transformers

EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm


Listen Later

The October 12, 2025 paper introduces EssenceBench, a novel methodology for compressing large language model (LLM) benchmarks while preserving evaluation fidelity. The core problem addressed is sample redundancy in existing benchmarks like the Open LLM Leaderboard, which is quantified through both text-level redundancy (semantic overlap) and ranking-level redundancy (correlation of model performance). The EssenceBench pipeline involves three steps: coarse filtering to eliminate redundant samples, fitness-based subset selection using a genetic algorithm (GA) to find optimal subsets, and attribution-based sample selection to further refine the subset for representational diversity. Experiments demonstrate that EssenceBench significantly reduces prediction error and improves ranking preservation compared to baselines like MetaBench and random selection, achieving comparable performance with much smaller subsets. The ablation studies confirm the essential role of both the filtering and attribution steps in optimizing the compressed datasets. Source: https://arxiv.org/pdf/2510.10457
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof