Merge to main

Building LLM Agents: Evaluation, Safety, and Tool Use with Georgios Chouliaras


Listen Later

From BERT to Agents: Building Production AI at Booking.com

After seven years building ML systems that serve millions of travelers, Georgios Chouliaras has watched the field transform from hand-coded chatbot rules to autonomous agents—and he's learned which shiny new approaches actually work in production.

Georgios Chouliaras, Senior Machine Learning Scientist at Booking.com, joins me to share hard-won insights from deploying AI at scale. His journey spans customer service chatbots that broke during COVID (because the training data didn't include "global pandemic"), company-wide ML best practices, and now the cutting edge of agent development.

In this episode, we explore:

  • Why LLMs represent the biggest abstraction leap since high-level programming languages, and what control you sacrifice for that flexibility
  • The practical framework for deciding when LLMs beat classical ML (hint: it's not always about having text data)
  • How to build LLM judges that actually work: starting with binary labels, achieving annotator agreement before anything else, and why boundary cases matter most for few-shot examples
  • What's genuinely unsolved in agents right now, memory as lifelong learning and planning approaches that don't collapse under complexity

Georgios challenges some popular assumptions: the REACT pattern everyone implements? He hasn't seen it consistently outperform simpler approaches. Massive parameter counts? Architecture and training data now matter more. His underhyped pick: straightforward function calling often beats elaborate agent architectures.

The core takeaway: Use the simplest tool that solves your problem. Production users don't care if you're running a sophisticated multi-agent system, they care if it works.

Connect with Georgios:

  • LinkedIn: https://www.linkedin.com/in/chouligi/

Connect with me:

  • LinkedIn: https://www.linkedin.com/in/christianbarra/

Check out our awesome sponsor, dearmachines.com, QA AI Agents for Continuous Testing.

...more
View all episodesView all episodes
Download on the App Store

Merge to mainBy Christian Barra