Product Growth Podcast

The PM’s Role in AI Evals: Step-by-Step


Listen Later

Today, we’ve got some of our most requested guests yet: Hamel Husain and Shreya Shankar, creators of the world’s best AI Evals cohort.

You’ll learn:

- Why AI evaluations are the most critical skill for building successful AI products

- What common mistakes people are making and how to avoid them

- How to effectively "hill climb" towards better AI performance

If you're building AI features, or aiming to master how AI Eval actually works, this episode is your step-by-step blueprint.

----

Brought to you by:

The AI Evals Course for PMs & Engineers: You get $800 with this link

Jira Product Discovery: Plan with purpose, ship with confidence

Vanta: Automate compliance, security, and trust with AI (Get $1,000 with my link)

AI PM Certification: Get $500 with code AAKASH25

----

Timestamps:

00:00:00- Preview

00:02:06 - Three reasons PMs NEED evals.

00:04:40 - Why PMs shouldn't view evals as monotonous

00:06:23 - Are evals the hardest part of AI products solved?

00:07:37 - Why can't you just rely on human "vibe checks"?

00:12:11 - Ad 1 (AI Evals Course)

00:13:10 - Ad 2 (Jira Product Discovery)

00:14:06 - Are LLMs good at 1-5ratings?

00:15:45 - The "Whack-a-mole" analogy without evals

00:16:26 - Hallucination problem in emails (Apollo story)

00:21:22 - How Airbnb used machine learning models?

00:23:56 - Evaluating RAG Systems.

00:29:52 - Ad 3 (Vanta)

00:30:56 - Ad 4 (AIPM Certification on Maven)

00:31:42 - Hill Climbing

00:35:51 - Red flag: Suspiciously high eval metrics

00:39:02 - Design principles for effective evals

00:42:42 - How OpenAI approaches evals

00:44:39 - Foundation models are trained on "average taste"

00:49:36 - Cons of fine-tuning

00:51:27 - Prompt engineering vs. RAG vs. Fine-tuning

00:53:00 - Introduction of "The Three Gulfs" framework

00:56:04 - Roadmap for learning AI evals

01:01:41 - Why error analysis is critical for LLMs

01:08:29 - Using LLM as a judge

01:10:15 - Frameworks for systematic problem-solving in labels

01:17:42 - Importance of niche and qualifying clients. (Pro tips)

01:18:43 - $800K for first course cohort!

01:20:15 - Why end a successful cohort?

01:25:49 - GOLD advice for creating a successful course

01:33:39 - Outro

----

Key Takeaways:

1. Stop Guessing. Eval Your AI. Your AI isn’t an MVP without robust evaluations. Build in judgment — or you’re just shipping hope. Without evaluation, AI performance is a happy accident.

2. Error Analysis = Your Superpower. General metrics won’t save you. You need to understand why your AI messed up. Only then can you fix it — not just wish it worked better.

3. 99% Accuracy is a LIE. Suspiciously high metrics usually mean your evaluation setup is broken. Real-world AI is never perfect. If your evals say otherwise, they’re flawed.

4. Fine-Tuning is a Trap (Mostly). Fine-tuning is expensive, brittle, and often unnecessary. Start with smarter prompts and RAG. Only fine-tune if you must.

5. Your Data’s Wild. Understand It. You can’t eyeball everything. Without structured evaluation, you’ll drown in noise and never find patterns or fixes that matter.

6. Models Fail to Generalize. Always. Your AI will break on new data. Don’t blame it. Adapt it. Use RAG, upgrade inputs, and stop expecting out-of-the-box magic.

7. OpenAI Doesn’t Get Your Vibe. Their models are average-taste. Your product isn’t. If you want your brand’s voice in your AI, you must define it yourself — with evals.

8. Trust LLM Judges... but validate them hard. LLMs can scale your evals — but you still need to verify them against human-labeled data. Don’t blindly trust your judge.

9. Your Prompts Are S**T. If your AI is bad, it’s probably your fault. The cheapest, most powerful fix? Sharpen your prompts. Clearer instructions = smarter AI.

10. Let AI Teach You. Seriously. LLM judges aren’t just scoring you — they can teach you. Reviewing how your AI fails is the best way to learn what great outputs should look like.

----

Check it out on Apple, Spotify, or YouTube.

----

Related Podcasts:

Complete Course: AI Product Management

Tutorial of Top 5 AI Prototyping Tools

If you only have 2 hrs, this is how to become an AI PM

College Dropout Raised $20M Building AI Tools | Cluely, Roy Lee

Bolt CEO and Founder on How he Hit $30M ARR in a Year

LogRocket CEO and Founder on How to Build a $100M+ AI Startup

Amplitude CEO and Founder on Building the Product Analytics Leader

----

P.S. More than 85% of you aren't subscribed yet. If you can subscribe on YouTube, follow on Apple & Spotify, my commitment to you is that we'll continue making this content better.

----

If you want to advertise, email productgrowthppp at gmail.



This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.news.aakashg.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

Product Growth PodcastBy Aakash Gupta

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

27 ratings


More shows like Product Growth Podcast

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,294 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

532 Listeners

The Official SaaStr Podcast: SaaS | Founders | Investors by SaaStr

The Official SaaStr Podcast: SaaS | Founders | Investors

172 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,087 Listeners

Masters of Scale by WaitWhat

Masters of Scale

3,984 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

235 Listeners

Product Thinking by Melissa Perri

Product Thinking

147 Listeners

The Startup Ideas Podcast by Greg Isenberg

The Startup Ideas Podcast

204 Listeners

Lenny's Podcast: Product | Career | Growth by Lenny Rachitsky

Lenny's Podcast: Product | Career | Growth

1,364 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

130 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

94 Listeners

AI and I by Dan Shipper

AI and I

37 Listeners

AI + a16z by a16z

AI + a16z

34 Listeners

Uncapped with Jack Altman by Alt Capital

Uncapped with Jack Altman

39 Listeners

How I AI by Claire Vo

How I AI

146 Listeners