November 26, 2024

When AI Benchmarks Lie: A Better Way to Evaluate Ft. Chris Hay

57 minutes

This episode explores the world of AI evaluation, with insights from Chris Hay on why benchmarks are "stupid" and how to effectively evaluate AI models.

Get the tools

pip install tool-use-ai

Check out Chris' Channel

https://www.youtube.com/@chrishayuk

Links

https://github.com/EleutherAI/lm-eval...

Lessons from the Trenches on

Reproducible Evaluation of Language Models - https://arxiv.org/pdf/2405.14782

https://github.com/confident-ai/deepeval

Connect with us

https://x.com/ToolUseAI

https://x.com/MikeBirdTech

https://x.com/FieroTy

https://x.com/chrishayuk

*The opinions of Chris are purely Chris's opinions and don't represent the opinions of his employer

...more

View all episodes

By Mike Bird

November 26, 2024

When AI Benchmarks Lie: A Better Way to Evaluate Ft. Chris Hay

57 minutes

This episode explores the world of AI evaluation, with insights from Chris Hay on why benchmarks are "stupid" and how to effectively evaluate AI models.

Get the tools

pip install tool-use-ai

Check out Chris' Channel

https://www.youtube.com/@chrishayuk

Links

https://github.com/EleutherAI/lm-eval...

Lessons from the Trenches on

Reproducible Evaluation of Language Models - https://arxiv.org/pdf/2405.14782

https://github.com/confident-ai/deepeval

Connect with us

https://x.com/ToolUseAI

https://x.com/MikeBirdTech

https://x.com/FieroTy

https://x.com/chrishayuk

*The opinions of Chris are purely Chris's opinions and don't represent the opinions of his employer

...more

Share When AI Benchmarks Lie: A Better Way to Evaluate Ft. Chris Hay

Sign up to save your podcasts

When AI Benchmarks Lie: A Better Way to Evaluate Ft. Chris Hay

When AI Benchmarks Lie: A Better Way to Evaluate Ft. Chris Hay