
Sign up to save your podcasts
Or


This episode explores the world of AI evaluation, with insights from Chris Hay on why benchmarks are "stupid" and how to effectively evaluate AI models.
https://github.com/confident-ai/deepeval
https://x.com/MikeBirdTech
https://x.com/FieroTy
https://x.com/chrishayuk
By Mike BirdThis episode explores the world of AI evaluation, with insights from Chris Hay on why benchmarks are "stupid" and how to effectively evaluate AI models.
https://github.com/confident-ai/deepeval
https://x.com/MikeBirdTech
https://x.com/FieroTy
https://x.com/chrishayuk