Share EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm

Copy link

October 22, 2025

EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm

13 minutes

The October 12, 2025 paper introduces **EssenceBench**, a novel methodology for **compressing large language model (LLM) benchmarks** while preserving evaluation fidelity. The core problem addressed is **sample redundancy** in existing benchmarks like the Open LLM Leaderboard, which is quantified through both **text-level redundancy** (semantic overlap) and **ranking-level redundancy** (correlation of model performance). The EssenceBench pipeline involves three steps: **coarse filtering** to eliminate redundant samples, **fitness-based subset selection** using a genetic algorithm (GA) to find optimal subsets, and **attribution-based sample selection** to further refine the subset for representational diversity. Experiments demonstrate that EssenceBench significantly **reduces prediction error** and **improves ranking preservation** compared to baselines like MetaBench and random selection, achieving comparable performance with much smaller subsets. The ablation studies confirm the essential role of both the filtering and attribution steps in optimizing the compressed datasets.

Source:

https://arxiv.org/pdf/2510.10457

...more

View all episodes

By mcgrof