
Sign up to save your podcasts
Or
The reliability gap between AI models and production-ready applications is where countless enterprise initiatives die in POC purgatory. In this episode of Chief AI Officer, Doug Safreno, Co-founder & CEO of Gentrace, offers the testing infrastructure that helped customers escape the Whac-A-Mole cycle plaguing AI development. Having experienced this firsthand when building an email assistant with GPT-3 in late 2022, Doug explains why traditional evaluation methods fail with generative AI, where outputs can be wrong in countless ways beyond simple classification errors.
With Gentrace positioned as a "collaborative LLM testing environment" rather than just a visualization layer, Doug shares how they've transformed companies from isolated engineering testing to cross-functional evaluation that increased velocity 40x and enabled successful production launches. His insights from running monthly dinners with bleeding-edge AI engineers reveal how the industry conversation has evolved from basic product questions to sophisticated technical challenges with retrieval and agentic workflows.
Topics discussed:
The reliability gap between AI models and production-ready applications is where countless enterprise initiatives die in POC purgatory. In this episode of Chief AI Officer, Doug Safreno, Co-founder & CEO of Gentrace, offers the testing infrastructure that helped customers escape the Whac-A-Mole cycle plaguing AI development. Having experienced this firsthand when building an email assistant with GPT-3 in late 2022, Doug explains why traditional evaluation methods fail with generative AI, where outputs can be wrong in countless ways beyond simple classification errors.
With Gentrace positioned as a "collaborative LLM testing environment" rather than just a visualization layer, Doug shares how they've transformed companies from isolated engineering testing to cross-functional evaluation that increased velocity 40x and enabled successful production launches. His insights from running monthly dinners with bleeding-edge AI engineers reveal how the industry conversation has evolved from basic product questions to sophisticated technical challenges with retrieval and agentic workflows.
Topics discussed: