Tool Use - AI Conversations

The Right Way to Do AI Evals (ft Freddie Vargus)


Listen Later

Are your AI agents unreliable? In this guide, we reveal a professional system for AI evals to help you build and ship better AI products, faster. Learn how to systematically test LLM performance, evaluate complex tool use, and improve multi-turn conversations. We break down the exact process for building a high-quality eval dataset, using milestones and minefields to track agent behaviour, and how to properly use an LLM as a judge without compromising quality. Stop guessing and start making real, measurable improvements to your AI today.


Check out Quotient AI

https://www.quotientai.co/


Sign up for A.I. coaching for professionals at: https://www.anetic.co


Get FREE AI tools

pip install tool-use-ai


Connect with us https://x.com/ToolUseAI

https://x.com/MikeBirdTech

https://x.com/freddie_v4


00:00:00 - intro

00:02:54 - Why You Need AI Evals

00:06:13 - How to Evaluate AI Agent Tool Use

00:29:24 - The Process for Building Your First Eval Dataset

00:42:44 - Using an LLM as a Judge The Right Way


Subscribe for more insights on AI tools, productivity, and AI evals.

Tool Use is a weekly conversation with AI experts brought to you by Anetic.

...more
View all episodesView all episodes
Download on the App Store

Tool Use - AI ConversationsBy Anetic