June 17, 2025

The Right Way to Do AI Evals (ft Freddie Vargus)

55 minutes

Are your AI agents unreliable? In this guide, we reveal a professional system for AI evals to help you build and ship better AI products, faster. Learn how to systematically test LLM performance, evaluate complex tool use, and improve multi-turn conversations. We break down the exact process for building a high-quality eval dataset, using milestones and minefields to track agent behaviour, and how to properly use an LLM as a judge without compromising quality. Stop guessing and start making real, measurable improvements to your AI today.

Check out Quotient AI

https://www.quotientai.co/

Get FREE AI tools

pip install tool-use-ai

Connect with us https://x.com/ToolUseAI

https://x.com/MikeBirdTech

https://x.com/freddie_v4

00:00:00 - intro

00:02:54 - Why You Need AI Evals

00:06:13 - How to Evaluate AI Agent Tool Use

00:29:24 - The Process for Building Your First Eval Dataset

00:42:44 - Using an LLM as a Judge The Right Way

Subscribe for more insights on AI tools, productivity, and AI evals.

Tool Use is a weekly conversation with AI experts brought to you by Anetic.

...more

View all episodes

By Mike Bird

June 17, 2025

The Right Way to Do AI Evals (ft Freddie Vargus)

55 minutes

Check out Quotient AI

https://www.quotientai.co/

Get FREE AI tools

pip install tool-use-ai

Connect with us https://x.com/ToolUseAI

https://x.com/MikeBirdTech

https://x.com/freddie_v4

00:00:00 - intro

00:02:54 - Why You Need AI Evals

00:06:13 - How to Evaluate AI Agent Tool Use

00:29:24 - The Process for Building Your First Eval Dataset

00:42:44 - Using an LLM as a Judge The Right Way

Subscribe for more insights on AI tools, productivity, and AI evals.

Tool Use is a weekly conversation with AI experts brought to you by Anetic.

...more

Share The Right Way to Do AI Evals (ft Freddie Vargus)

Sign up to save your podcasts

The Right Way to Do AI Evals (ft Freddie Vargus)

The Right Way to Do AI Evals (ft Freddie Vargus)