Tool Use - AI Conversations

When AI Benchmarks Lie: A Better Way to Evaluate Ft. Chris Hay


Listen Later

This episode explores the world of AI evaluation, with insights from Chris Hay on why benchmarks are "stupid" and how to effectively evaluate AI models.

Get the tools
pip install tool-use-ai
Check out Chris' Channel
https://www.youtube.com/@chrishayuk
Links
https://github.com/EleutherAI/lm-eval...
Lessons from the Trenches on
Reproducible Evaluation of Language Models - https://arxiv.org/pdf/2405.14782

https://github.com/confident-ai/deepeval

Connect with us
https://x.com/ToolUseAI

https://x.com/MikeBirdTech

https://x.com/FieroTy

https://x.com/chrishayuk

*The opinions of Chris are purely Chris's opinions and don't represent the opinions of his employer

...more
View all episodesView all episodes
Download on the App Store

Tool Use - AI ConversationsBy Anetic