SallyAnn DeLucia, Director of Product, ArizeJack Zhou, Staff Engineer, ArizeIn this episode, we cover:
What tracing, observability, and evals really mean in GenAI applicationsHow Arize used its own platform to build Alyx, its AI agentThe role of customer success engineers in surfacing repeatable workflowsWhy early prototyping looked like messy notebooks and hacked-together local appsHow dogfooding shaped Alyx’s evolution and built confidence for launchWhy evals start messy, and how Arize layered evals across tool calls, sessions, and system-level decisionsThe importance of cross-functional, boundary-spanning teams in building AI productsWhat’s next for Alyx: moving from “on rails” workflows to more autonomous, agentic planning loopsArize AI — Sign up for a free account and try AlexArize Blog — Lessons learned from building AI productsMaven AI Evals Course — The course Teresa took to learn about evals (Get 35% off with Teresa’s affiliate link)Cursor — The AI-powered code editor used by the Arize engineering teamDataDog — For understanding application tracesOpenAI GPT Models — GPT-3.5, GPT-4, and newer models used in early and current versions of AlexJupyter Notebooks — A tool for combining code, data, and notes, used in Arise’s prototypingAxial Coding Method by Hamel Husain — A framework for analyzing data and designing evals00:00 Introduction to Sally Ann and Jack
01:08 Overview of Arize.ai and Its Core Components
01:44 Deep Dive into Tracing, Observability, and Evals
03:56 Introduction to Alyx: Arize's AI Agent
04:15 The Genesis and Evolution of Alyx
08:51 Challenges and Solutions in Building Alyx
24:33 Prototyping and Early Development of Alyx
26:22 Exploring the Power of Coding Notebooks
26:51 Early Experiments with Alyx
27:59 Challenges with Real Data
29:20 Internal Testing and Dogfooding
31:55 The Importance of Evals
35:16 Developing Custom Evals
43:09 Future Plans for Alyx
47:59 How to Get Started with Alyx