Share FUSE: Ensembling Verifiers with Zero Labeled Data

Copy link

May 14, 2026

FUSE: Ensembling Verifiers with Zero Labeled Data

20 minutes

This paper introduces Fully Unsupervised Score Ensembling (FUSE), a novel framework designed to improve the accuracy of large language model (LLM) outputs without requiring human-labeled data. By aggregating scores from multiple imperfect verifiers, FUSE identifies the most reliable responses during the inference process, a technique known as test-time scaling. The method addresses the limitations of traditional ensembling by mathematically adjusting for statistical dependencies between verifiers that typically hinder unsupervised performance. Experimental results demonstrate that FUSE frequently matches or exceeds the performance of semi-supervised models that have access to ground truth labels. This effectiveness is validated across diverse benchmarks, ranging from academic datasets like MMLU to highly difficult math and logic exams. Ultimately, FUSE offers a scalable, cost-effective solution for filtering synthetic data and enhancing model reliability in complex reasoning tasks.

...more

View all episodes

By Enoch H. Kang

May 14, 2026

FUSE: Ensembling Verifiers with Zero Labeled Data

20 minutes

...more

Sign up to save your podcasts