Supra Insider

#63: How PMs can bring predictability to AI products | Aman Khan (Head of Product @ Arize AI, ex-Spotify, ex-Apple)


Listen Later

If you’ve ever launched an AI feature and later realized it wasn’t quite ready—or struggled to define what “quality” even looks like in an AI product—this episode is for you.

In this episode of Supra Insider, Marc and Ben sit down with Aman Khan, Head of Product of Arize AI and a leading voice in AI product evaluation. Aman has helped dozens of teams—from scrappy startups to massive consumer platforms—build systematic approaches for evaluating LLM-powered features. Together, they explore the dangers of “vibe coding” your way to production, how to define and operationalize evals across different layers of your product, and why even “good” outputs can still lead to bad outcomes without proper measurement.

Whether you’re a PM under pressure to ship AI features fast, or a product leader figuring out how to instill quality and reliability into your development process, this conversation is packed with frameworks, analogies (like self-driving cars), and hard-won lessons you can use to build smarter and ship more confidently.

All episodes of the podcast are also available on Spotify, Apple and YouTube.

New to the pod? Subscribe below to get the next episode in your inbox 👇

To support the podcast, please check out the links below:

* Supra has teamed up with Maven to bring you something special – courses that our own community members have personally curated. And because you're part of the Supra family, you get $100 off any of these handpicked selections with code SUPRAxMAVEN. Aman’s course is part of the partnership 🚀

* Preparing for a PM interview with tech companies like Meta and Google? Check out Ben's course for Product Sense and Analytical Thinking PM interviews (selected by Lenny Rachitsky as a Top Course in Product on Maven 🔥) and his AI Copilot for interview practice. Supra Insider listeners get 10% off the course with promo code “suprainsider”. Enrollment is now open for the next cohort starting July 15th.

In this episode, we covered the following topics:

* (00:00) The danger of launching AI features without proper evaluation—and why trust is hard to rebuild

* (04:12) “Vibe coding” explained: when teams ship based on gut feel instead of structured validation

* (09:07) Real-world example: a travel chatbot gets great engagement—until users start jailbreaking it

* (15:36) Defining “good enough”: how to set AI quality metrics across tone, latency, hallucinations, and risk

* (19:58) What PMs can learn from self-driving car evals—and how to build iterative test sets for reliability

* (23:04) How LLM-as-a-judge works: using one model to grade another, and when to rely on human labels

* (34:14) Spotting dangerous “false positives”: when outputs seem right but come from flawed reasoning

* (42:22) What to do when your evals say “good” but the business metrics drop—and who’s responsible

* And more!

If you’d like to watch the video of our conversation, you can catch that on YouTube:

Links:

* Aman Khan Website: http://amank.ai

* Aman Khan: https://www.linkedin.com/in/amanberkeley/

* Aman Khan X: https://x.com/_amankhan

* Marc: https://www.linkedin.com/in/marcbaselga/

* Ben: https://www.linkedin.com/in/benerez/

To support the podcast, please check out the links below:

* Supra has teamed up with Maven to bring you something special – courses that our own community members have personally curated. And because you're part of the Supra family, you get $100 off any of these handpicked selections with code SUPRAxMAVEN.

* Preparing for a PM interview with tech companies like Meta and Google? Check out Ben's course for Product Sense and Analytical Thinking PM interviews (selected by Lenny Rachitsky as a Top Course in Product on Maven 🔥) and his AI Copilot for interview practice. Supra Insider listeners get 10% off the course with promo code “suprainsider”. Enrollment is now open for the next cohort starting June 17th.

If you enjoyed this conversation, please share it with a friend or colleague 👇

Also, if you’re a new subscriber, we encourage you to check out some of the recent episodes you might have missed:

And here are some less recent favorites:



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit suprainsider.substack.com
...more
View all episodesView all episodes
Download on the App Store

Supra InsiderBy Marc Baselga, Ben Erez

  • 5
  • 5
  • 5
  • 5
  • 5

5

6 ratings


More shows like Supra Insider

View all
Cato Podcast by Cato Institute

Cato Podcast

966 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,420 Listeners

Pivot by New York Magazine

Pivot

9,507 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,087 Listeners

Founders by David Senra

Founders

2,106 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

235 Listeners

TechCrunch Daily Crunch by TechCrunch

TechCrunch Daily Crunch

41 Listeners

All-In with Chamath, Jason, Sacks  Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks Friedberg

9,844 Listeners

Barron's Streetwise by Barron's

Barron's Streetwise

1,553 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

501 Listeners

Hard Fork by The New York Times

Hard Fork

5,466 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

561 Listeners

AI and I by Dan Shipper

AI and I

38 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners

Hasan Minhaj Doesn't Know by 186k Films

Hasan Minhaj Doesn't Know

449 Listeners