
Sign up to save your podcasts
Or


What you’ll learn:
• How reinforcement learning can reduce AI agent error rates by up to 60% and drastically lower inference costs.
• The critical difference between supervised fine-tuning and RL for agentic workflows, and why RL is essential for true agent reliability.
• A practical, code-level walkthrough of building and training an email search agent that outperforms OpenAI’s GPT-3.5 on a 14-billion-parameter open-source model.
• Strategies for generating high-quality synthetic data and designing nuanced reward functions with ‘partial credit’ to effectively train your agents.
• Key use cases where RL fine-tuning delivers the most significant benefits, including real-time voice agents and high-volume applications.
Kyle Corbett is the founder of OpenPipe, a platform dedicated to helping enterprises build and deploy customized AI models using advanced fine-tuning and reinforcement learning. He’s a seasoned builder who has been working at the frontier of fine-tuning since before public APIs existed.
Key topics covered:
• The limitations of off-the-shelf LLMs for agent reliability and how RL solves them.
• The importance of latency and cost optimization in real-world AI deployments.
• Detailed explanation of the agentic workflow and tool calling in an email search bot.
• The Enron email dataset as a realistic environment for agent training.
• OpenPipe’s open-source Agent Reinforcement Trainer (ART) library for building RL agents.
• The iterative process of data generation, rubric-based scoring, and model updates.
This episode of AI Tinkerers One-Shot goes under the hood with Kyle to share practical learnings for the community.
💡 Resources:
• OpenPipe Website - https://openpipe.ai
• Kyle Corbett LinkedIn - https://www.linkedin.com/in/kcorbitt/
• AI Tinkerers - https://aitinkerers.org
• One-Shot Podcast - https://one-shot.aitinkerers.org/
Social Media: @AITinkerers @OpenPipeAI @corbtt
👍 Like this video if you found it valuable, and subscribe to AI Tinkerers One-Shot for more conversations with innovators building the future of AI!
00:00 Introduction
01:09 Welcome Kyle Corbett, Founder of OpenPipe
01:55 What OpenPipe Does
02:31 OpenPipe’s Journey and YC Experience
00:04:13 Email Search Bot Project Overview
00:05:19 Why Fine-Tuning for Email Search
00:06:22 Email Search Bot: Queries and Results
00:09:23 On-Premise Deployment and Data Sensitivity
00:10:45 Agent Trace Example and Tooling
00:13:55 Using the Enron Dataset
00:15:13 Reinforcement Learning Fundamentals
00:17:01 Synthetic Data Generation with Gemini 2.5 Pro
00:18:51 Reliable Q&A Pairs and Data Scale
00:21:59 Fine-Tuning Impact on Model Performance
00:22:25 RL Adoption in Industry and Community
00:24:37 Rollout Function and Agent Implementation
00:27:52 Rubric and Reward Calculation for RL
00:30:39 Training Loop and Model Updates
00:33:52 RL Fine-Tuning vs. OpenAI’s Fine-Tuning
00:40:38 Time Commitment for RL Projects
00:41:55 Use Cases for RL Fine-Tuning
00:45:37 OpenPipe’s Offerings: Open Source, White Glove Service
00:47:07 Kyle’s Side Tinkering and Future of AI
00:49:59 Discovering AI Tinkerers
By Joe HeitzebergWhat you’ll learn:
• How reinforcement learning can reduce AI agent error rates by up to 60% and drastically lower inference costs.
• The critical difference between supervised fine-tuning and RL for agentic workflows, and why RL is essential for true agent reliability.
• A practical, code-level walkthrough of building and training an email search agent that outperforms OpenAI’s GPT-3.5 on a 14-billion-parameter open-source model.
• Strategies for generating high-quality synthetic data and designing nuanced reward functions with ‘partial credit’ to effectively train your agents.
• Key use cases where RL fine-tuning delivers the most significant benefits, including real-time voice agents and high-volume applications.
Kyle Corbett is the founder of OpenPipe, a platform dedicated to helping enterprises build and deploy customized AI models using advanced fine-tuning and reinforcement learning. He’s a seasoned builder who has been working at the frontier of fine-tuning since before public APIs existed.
Key topics covered:
• The limitations of off-the-shelf LLMs for agent reliability and how RL solves them.
• The importance of latency and cost optimization in real-world AI deployments.
• Detailed explanation of the agentic workflow and tool calling in an email search bot.
• The Enron email dataset as a realistic environment for agent training.
• OpenPipe’s open-source Agent Reinforcement Trainer (ART) library for building RL agents.
• The iterative process of data generation, rubric-based scoring, and model updates.
This episode of AI Tinkerers One-Shot goes under the hood with Kyle to share practical learnings for the community.
💡 Resources:
• OpenPipe Website - https://openpipe.ai
• Kyle Corbett LinkedIn - https://www.linkedin.com/in/kcorbitt/
• AI Tinkerers - https://aitinkerers.org
• One-Shot Podcast - https://one-shot.aitinkerers.org/
Social Media: @AITinkerers @OpenPipeAI @corbtt
👍 Like this video if you found it valuable, and subscribe to AI Tinkerers One-Shot for more conversations with innovators building the future of AI!
00:00 Introduction
01:09 Welcome Kyle Corbett, Founder of OpenPipe
01:55 What OpenPipe Does
02:31 OpenPipe’s Journey and YC Experience
00:04:13 Email Search Bot Project Overview
00:05:19 Why Fine-Tuning for Email Search
00:06:22 Email Search Bot: Queries and Results
00:09:23 On-Premise Deployment and Data Sensitivity
00:10:45 Agent Trace Example and Tooling
00:13:55 Using the Enron Dataset
00:15:13 Reinforcement Learning Fundamentals
00:17:01 Synthetic Data Generation with Gemini 2.5 Pro
00:18:51 Reliable Q&A Pairs and Data Scale
00:21:59 Fine-Tuning Impact on Model Performance
00:22:25 RL Adoption in Industry and Community
00:24:37 Rollout Function and Agent Implementation
00:27:52 Rubric and Reward Calculation for RL
00:30:39 Training Loop and Model Updates
00:33:52 RL Fine-Tuning vs. OpenAI’s Fine-Tuning
00:40:38 Time Commitment for RL Projects
00:41:55 Use Cases for RL Fine-Tuning
00:45:37 OpenPipe’s Offerings: Open Source, White Glove Service
00:47:07 Kyle’s Side Tinkering and Future of AI
00:49:59 Discovering AI Tinkerers