October 16, 2025

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

1 hour 8 minutes

In this deep dive with Kyle Corbitt, co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age of AI agents and the critical shift from supervised fine-tuning to reinforcement learning. Kyle shares his journey from leading YC’s Startup School to building OpenPipe, initially focused on distilling expensive GPT-4 workflows into smaller, cheaper models before pivoting to RL-based agent training as frontier model prices plummeted. The conversation reveals why 90% of AI projects remain stuck in proof-of-concept purgatory - not due to capability limitations, but reliability issues that Kyle believes can be solved through continuous learning from real-world experience. He discusses the breakthrough of RULER (Relative Universal Reinforcement Learning Elicited Rewards), which uses LLMs as judges to rank agent behaviors relatively rather than absolutely, making RL training accessible without complex reward engineering. Kyle candidly assesses the challenges of building realistic training environments for agents, explaining why GRPO (despite its advantages) may be a dead end due to its requirement for perfectly reproducible parallel rollouts. He shares insights on why LoRAs remain underrated for production deployments, why GEPA and prompt optimization haven’t lived up to the hype in his testing, and why the hardest part of deploying agents isn’t the AI - it’s sandboxing real-world systems with all their bugs and edge cases intact. The discussion also covers OpenPipe’s acquisition by CoreWeave, the launch of their serverless reinforcement learning platform, and Kyle’s vision for a future where every deployed agent continuously learns from production experience. He predicts that solving the reliability problem through continuous RL could unlock 10x more AI inference demand from projects currently stuck in development, fundamentally changing how we think about agent deployment and maintenance.

Key Topics:

* The rise and fall of fine-tuning as a business model

* Why 90% of AI projects never reach production

* RULER: Making RL accessible through relative ranking

* The environment problem: Why sandboxing is harder than training

* GRPO vs PPO and the future of RL algorithms

* LoRAs: The underrated deployment optimization

* Why GEPA and prompt optimization disappointed in practice

* Building world models as synthetic training environments

* The $500B Stargate bet and OpenAI’s potential crypto play

* Continuous learning as the path to reliable agents

References

https://www.linkedin.com/in/kcorbitt/

* Aug 2023 https://openpipe.ai/blog/from-prompts-to-models

* DEC 2023 https://openpipe.ai/blog/mistral-7b-fine-tune-optimized

* JAN 2024 https://openpipe.ai/blog/s-lora

* MAY 2024 https://openpipe.ai/blog/the-ten-commandments-of-fine-tuning-in-prod

* Oct 2024 https://openpipe.ai/blog/announcing-dpo-support

* AIE NYC 2025 Finetuning 500m agents

* AIEWF 2025 How to train your agent (ART-E)

* SEPT 2025 ACQUISTION https://openpipe.ai/blog/openpipe-coreweave

* W&B Serverless RL https://openpipe.ai/blog/serverless-rl?refresh=1760042248153

Full Video Episode

Timestamps

00:00 Introductions

03:15 The Evolution of OpenPipe: From SFT to RL

07:49 The Mistral Era and LoRA Adapters

11:40 When You Actually Need Fine-Tuning

14:43 The Pivot to Reinforcement Learning

21:29 GRPO vs PPO: The Technical Trade-offs

24:02 The Environment Problem in RL

35:52 JAPA and Automated Prompt Optimization

44:35 Open vs Closed Models: The Token Economics

50:38 Ruler: Self-Supervised RL Rewards

57:09 World Models as Environment Solutions

1:00:15 CoreWeave Acquisition and Future Vision

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

...more

View all episodes

By Latent.Space

4.6

9292 ratings

October 16, 2025

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

1 hour 8 minutes

Key Topics:

* The rise and fall of fine-tuning as a business model

* Why 90% of AI projects never reach production

* RULER: Making RL accessible through relative ranking

* The environment problem: Why sandboxing is harder than training

* GRPO vs PPO and the future of RL algorithms

* LoRAs: The underrated deployment optimization

* Why GEPA and prompt optimization disappointed in practice

* Building world models as synthetic training environments

* The $500B Stargate bet and OpenAI’s potential crypto play

* Continuous learning as the path to reliable agents

References

https://www.linkedin.com/in/kcorbitt/

* Aug 2023 https://openpipe.ai/blog/from-prompts-to-models

* DEC 2023 https://openpipe.ai/blog/mistral-7b-fine-tune-optimized

* JAN 2024 https://openpipe.ai/blog/s-lora

* MAY 2024 https://openpipe.ai/blog/the-ten-commandments-of-fine-tuning-in-prod

* Oct 2024 https://openpipe.ai/blog/announcing-dpo-support

* AIE NYC 2025 Finetuning 500m agents

* AIEWF 2025 How to train your agent (ART-E)

* SEPT 2025 ACQUISTION https://openpipe.ai/blog/openpipe-coreweave

* W&B Serverless RL https://openpipe.ai/blog/serverless-rl?refresh=1760042248153

Full Video Episode

Timestamps

00:00 Introductions

03:15 The Evolution of OpenPipe: From SFT to RL

07:49 The Mistral Era and LoRA Adapters

11:40 When You Actually Need Fine-Tuning

14:43 The Pivot to Reinforcement Learning

21:29 GRPO vs PPO: The Technical Trade-offs

24:02 The Environment Problem in RL

35:52 JAPA and Automated Prompt Optimization

44:35 Open vs Closed Models: The Token Economics

50:38 Ruler: Self-Supervised RL Rewards

57:09 World Models as Environment Solutions

1:00:15 CoreWeave Acquisition and Future Vision

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

...more

More shows like Latent Space: The AI Engineer Podcast

View all

The a16z Show

1,107 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn

308 Listeners

NVIDIA AI Podcast

347 Listeners

Y Combinator Startup Podcast

233 Listeners

Practical AI

211 Listeners

Google DeepMind: The Podcast

204 Listeners

Last Week in AI

311 Listeners

Machine Learning Street Talk (MLST)

101 Listeners

Dwarkesh Podcast

562 Listeners

Big Technology Podcast

512 Listeners

No Priors: Artificial Intelligence | Technology | Startups

144 Listeners

This Day in AI Podcast

227 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis

680 Listeners

BG2Pod with Brad Gerstner and Bill Gurley

460 Listeners

AI + a16z

33 Listeners

Share Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

Sign up to save your podcasts

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

More shows like Latent Space: The AI Engineer Podcast

The a16z Show

Super Data Science: ML & AI Podcast with Jon Krohn

NVIDIA AI Podcast

Y Combinator Startup Podcast

Practical AI

Google DeepMind: The Podcast

Last Week in AI

Machine Learning Street Talk (MLST)

Dwarkesh Podcast

Big Technology Podcast

No Priors: Artificial Intelligence | Technology | Startups

This Day in AI Podcast

The AI Daily Brief: Artificial Intelligence News and Analysis

BG2Pod with Brad Gerstner and Bill Gurley

AI + a16z