March 02, 2026

The teams pulling ahead aren't the ones with the best models

35 minutes

AI products are shipping faster than ever. But shipping isn’t impact. The teams pulling ahead aren’t the ones with the best models — they’re the ones who can prove their product moves the business. This edition is about that gap. How to measure what matters, where the biggest barriers to impact are hiding, and what the latest research says about getting AI products to actually drive growth. Because the real competitive advantage isn’t AI. It’s knowing whether your AI is working.

What You’ll Learn in This Edition

This edition cuts through the noise to focus on the measurement gap — the difference between shipping AI and proving AI drives growth.

* The Power/Speed/Impact/Joy bullseye — a calibration framework for AI products that actually drive growth

* A Nature paper reveals why removing friction from AI may be destroying the learning your team needs

* John Maeda on why design teams are being hollowed out — and why PMs are next

* Benedict Evans on why even OpenAI can’t solve product-market fit with capability alone

* Research that should change how your team thinks about AI-assisted skill building

Thanks for reading Product Impact | AI Strategy, Value Creation, AI UX! This post is public so feel free to share it.

Episode 1: Why Your AI Metrics Are Lying to You - Framework for improving AI product performance

Your AI product might be fast, capable, and technically impressive — and still not drive the growth your business needs. In this episode, Brittany Hobbs and I introduce the Power, Speed, Impact, and Joy bullseye — a calibration framework borrowed from F1 racing. The teams winning aren’t shipping more features. They’re measuring different things entirely. We break down a three-layer eval approach and why most completion metrics are hiding the signals that matter.

“Success does not mean satisfaction. If someone stops engaging, does that mean they solved their problem — or that they were frustrated and left?” — Brittany Hobbs

Listen on Spotify | Apple Podcasts | YouTube

Your Role Isn’t Shrinking. It’s Being Hollowed Out.

John Maeda — Three major tech companies have restructured design teams into “prompt engineering pods.” Maeda’s #DesignInTech 2026 calls it what it is: the elimination of design judgment from the product process. “When you replace a designer with a prompt, you don’t lose the pixels. You lose the questions that should have been asked before anyone opened a tool.” This applies to product managers too — if your PM’s job becomes prompt-wrangling instead of deciding what to build and why, you’ve automated the wrong layer. The roles aren’t disappearing. The judgment inside them is.

Featured Resource: Strategy for Measuring & Improving AI Products

The gap between what AI products ship and what they prove is where growth stalls. This framework moves teams from tracking activity — token counts, completion rates, session length — to defining and measuring the outcomes that actually drive business impact. Most teams ship features and assume engagement means success. It doesn’t. If your team can’t answer “is this AI feature making the business better?” with data, you’re flying blind. The framework covers product discovery through scale, with concrete steps for building measurement into your AI product from the start — not bolting it on after launch.

Read the full resource at ph1.ca

Waterfall: we’ll build you a car in 18 months. Agile: here’s a skateboard, we’ll iterate. AI: here’s a photorealistic render of a Lamborghini that doesn’t start. We’ve never made it easier to build something that looks incredible and does absolutely nothing. AI development doesn’t need more iteration — it needs someone asking “does this thing actually drive?”

If your team is celebrating demos instead of outcomes, you’re already behind the teams that measure first and ship second.

Two years of capability gains. Almost no reliability improvement. This is the chart that should be on every product team’s wall — because it explains why your AI demos brilliantly and fails in production. Capability without reliability isn’t a product. It’s a liability.

If your team can’t name which type of AI they’re building, they can’t measure whether it’s working. Six categories that force precision. — Narain Jashanmal

Product Impact Resources

The resources in this edition make one thing clear: the teams investing in measurement and deliberate friction are pulling ahead, while the ones chasing capability are stalling. These resources challenge the assumption that faster and more capable automatically means better outcomes.

* Removing struggle from AI workflows destroys the learning that builds expertise. Teams should audit which friction to keep and which to cut. Against Frictionless AI — Inzlicht & Bloom in Nature

* AI users learned 17% less without any efficiency gains. How your team uses AI matters more than whether they use it. How AI Impacts Skill Formation — Shen & Tamkin RCT

* Two years of capability gains with only modest reliability improvement. The barrier to growth isn’t what models can do — it’s whether you can trust them. The Capability-Reliability Gap — Narayanan et al.

* Polished AI outputs reduce critical evaluation by users. Build in friction points that force your team to think before accepting. (Anthropic studying its own product — read accordingly.) Anthropic AI Fluency Index

* AI forces strategic clarity because you cannot delegate logic you haven’t articulated. That’s a feature, not a bug. Strategy as Protocol — Schwarzmann via Scaman

* Six functional AI categories that sharpen how teams talk about what they’re building. Precision in language is precision in product decisions. AI Taxonomy — Jashanmal

* Mapping 50 AI startups across six pricing models reveals that pricing is a product decision, not a finance one. Get it wrong and adoption stalls regardless of quality. How to Price AI Products — Gupta

* Wade Foster shut Zapier down for a week-long AI hackathon. Adoption went from 10% to 50% in five days. Adoption follows experience, not mandates. Zapier’s Code Red Hackathon

Product Impact News

This is the news that matters. Reliability failures are making headlines, benchmark credibility is collapsing, and even the market leaders can’t prove product-market fit. The gap between what AI can do and what it can prove is widening, not closing.

* ChatGPT missed diabetic ketoacidosis and respiratory failure in 52% of emergency cases. Suicide-risk alerts fired inconsistently. Reliability is the product, not a feature to ship later. ChatGPT Health Under-Triaged 52% of Emergencies

* LLMs chose nuclear strikes in 95% of simulated crises. The nuclear taboo is no impediment to AI escalation — a stark reminder that evaluation stakes extend beyond product. AI Models Chose Nuclear Strikes in 95% of Simulated Crises

* Google patent US12536233B1 lets it generate its own landing page from your product feed if yours scores below threshold. Own your experience or someone else will. Google Patented AI Landing Pages That Replace Your Storefront

* 84% of the world has never used AI. Only 0.3% pay for it. The growth opportunity is massive — but only for teams that solve adoption, not just access. 84% of the World Has Never Used AI

* 80% of ChatGPT users sent fewer than 1,000 messages in 2025. Even the market leader hasn’t solved product-market fit. Capability alone isn’t enough. OpenAI Has No Moat and Engagement an Inch Deep

* RCT shows AI tools made experienced developers work faster and take on broader tasks — without measurable output gains. Speed is not productivity. METR: Experienced Devs Saw Zero Productivity Gain

* NIST finds standard benchmarks conflate different performance measures. Models with different scores may perform identically in production. Build your own evals. NIST: AI Benchmarks Don’t Measure What They Claim

* MIT reviewed 300+ AI implementations: 85% failed, 91% of models degrade silently. The 5% that succeeded built measurement into the product from day one. 85% of AI Projects Fail, 91% of Models Degrade Silently

Key takeaways

The throughline across this edition is unmistakable: capability without measurement is theater. From the METR study showing zero productivity gains for experienced developers to MIT’s finding that 85% of AI projects fail, the evidence converges on one point — the teams that win are the ones that prove their AI works.

* Measure outcomes, not activity. Completion rates, token counts, and session length tell you your AI is running — not that it’s working. Define what “working” means for your business before you ship.

* Protect judgment. Automate everything else. The roles being hollowed out aren’t the ones doing rote work — they’re the ones asking the hard questions. If you’re automating decisions instead of tasks, you’re cutting the wrong layer.

* Friction is a feature. Research consistently shows that removing struggle from AI workflows destroys learning and degrades skill. Build in the friction that keeps your team sharp, and strip out the friction that just wastes time.

If your AI product ships well but you can’t prove it drives growth, that’s the gap PH1 closes. We help teams define what success looks like for AI experiences and build the measurement systems to prove it — from product discovery through scale. ph1.ca

Thank you for supporting the Product Impact Podcast

Every episode tackles the gap between what AI products promise and what they actually deliver. Brittany and I bring in the builders, researchers, and leaders who are closing that gap — with frameworks, evidence, and hard-won lessons. If an episode shifted how you think about your product, share it. Follow the show so you never miss one. That’s how we grow this community.

* Episode 1: Why Your AI Metrics Are Lying to You

* Vibe Coding Will Disrupt Product — Base44’s Path to $80M

* AI Trap: Hard Truths About the Job Market

Browse all episodes at productimpactpod.com — filter by topic to find the episode that fits what you’re working on right now.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit productimpactpod.substack.com

...more

View all episodes

By Arpy Dragffy