February 20, 2025

Is AI Benchmarking Broken? The Truth Behind "con@64" Revealed Brought to you by Avonetics.com

9 minutes

Discover the controversial "con@64" technique, where AI models are prompted 64 times to reach a consensus answer. Is this a legitimate way to reduce variance or a sneaky trick to inflate benchmark scores? Dive into the heated debate on whether this practice skews real-world performance comparisons and unfairly impacts perceptions of model capabilities. Learn why some accuse XAI engineers of overhyping AI and how differing "con" values could be misleading the industry. For advertising opportunities, visit Avonetics.com.

...more

View all episodes

By Beaker Banter

February 20, 2025

Is AI Benchmarking Broken? The Truth Behind "con@64" Revealed Brought to you by Avonetics.com

9 minutes

...more

Share Is AI Benchmarking Broken? The Truth Behind "con@64" Revealed Brought to you by Avonetics.com

Sign up to save your podcasts

Is AI Benchmarking Broken? The Truth Behind "con@64" Revealed Brought to you by Avonetics.com

Is AI Benchmarking Broken? The Truth Behind "con@64" Revealed Brought to you by Avonetics.com