February 26, 2026

How Intercom Cut $250K/Month by Ditching GPT for Qwen

53 minutes

Intercom was spending $250K/month on a single summarization task using GPT. Then they replaced it with a fine-tuned 14B parameter Qwen model and saved almost all of it. In this episode, Intercom's Chief AI Officer, Fergal Reid, walks through exactly how they made that call, where their approach has changed over time, and how all of their efforts built their Fin customer service agent.

Fergal breaks down how Fin went from 30% to nearly 70% resolution rate and why most of those gains came from surrounding systems (custom re-rankers, retrieval models, query canonicalization), not the core frontier LLM. He explains why higher latency counterintuitively increases resolution rates, how they built a custom re-ranker that outperformed Cohere using ModernBERT, and why he believes vertically integrated AI products will win in the long term.

If you're deciding between fine-tuning open-weight models and using frontier APIs in production, you won't find a more detailed decision process walkthrough.

🔗 Connect with Fergal:

Twitter/X: https://x.com/fergal_reid
LinkedIn: https://www.linkedin.com/in/fergalreid/
Fin: https://fin.ai/

🔗 Connect with Conor:

YouTube: https://www.youtube.com/@ConorBronsdon
Newsletter: https://conorbronsdon.substack.com/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/

🔗 More episodes: https://chainofthought.showCHAPTERS

0:00 Intro

0:46 Why Intercom Completely Reversed Their Fine-Tuning Position

8:00 The $250K/Month Summarization Task (Query Canonicalization)

11:25 Training Infrastructure: H200s, LoRA to Full SFT, and GRPO

14:09 Why Qwen Models Specifically Work for Production

18:03 Goodhart's Law: When Benchmarks Lie

19:47 A/B Testing AI in Production: Soft vs. Hard Resolutions

25:09 The Latency Paradox: Why Slower Responses Get More Resolutions

26:33 Why Per-Customer Prompt Branching Is Technical Debt

28:51 Sponsor: Galileo

29:36 Hiring Scientists, Not Just Engineers

32:15 Context Engineering: Intercom's Full RAG Pipeline

35:35 Customer Agent, Voice, and What's Next for Fin

39:30 Vertical Integration: Can App Companies Outrun the Labs?

47:45 When Engineers Laughed at Claude Code

52:23 Closing Thoughts

TAGSFergal Reid, Intercom, Fin AI agent, open-weight models, Qwen models, fine-tuning LLMs, post-training, RAG pipeline, customer service AI, GRPO reinforcement learning, A/B testing AI, Claude Code, vertical AI integration, inference cost optimization, context engineering, AI agents, ModernBERT reranker, scaling AI teams, Conor Bronsdon, Chain of Thought

...more

View all episodes

By Conor Bronsdon

2727 ratings

February 26, 2026

How Intercom Cut $250K/Month by Ditching GPT for Qwen

53 minutes

If you're deciding between fine-tuning open-weight models and using frontier APIs in production, you won't find a more detailed decision process walkthrough.

🔗 Connect with Fergal:

Twitter/X: https://x.com/fergal_reid
LinkedIn: https://www.linkedin.com/in/fergalreid/
Fin: https://fin.ai/

🔗 Connect with Conor:

YouTube: https://www.youtube.com/@ConorBronsdon
Newsletter: https://conorbronsdon.substack.com/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/

🔗 More episodes: https://chainofthought.showCHAPTERS

0:00 Intro

0:46 Why Intercom Completely Reversed Their Fine-Tuning Position

8:00 The $250K/Month Summarization Task (Query Canonicalization)

11:25 Training Infrastructure: H200s, LoRA to Full SFT, and GRPO

14:09 Why Qwen Models Specifically Work for Production

18:03 Goodhart's Law: When Benchmarks Lie

19:47 A/B Testing AI in Production: Soft vs. Hard Resolutions

25:09 The Latency Paradox: Why Slower Responses Get More Resolutions

26:33 Why Per-Customer Prompt Branching Is Technical Debt

28:51 Sponsor: Galileo

29:36 Hiring Scientists, Not Just Engineers

32:15 Context Engineering: Intercom's Full RAG Pipeline

35:35 Customer Agent, Voice, and What's Next for Fin

39:30 Vertical Integration: Can App Companies Outrun the Labs?

47:45 When Engineers Laughed at Claude Code

52:23 Closing Thoughts

...more

More shows like Chain of Thought

View all

The Daily

113,026 Listeners

Share How Intercom Cut $250K/Month by Ditching GPT for Qwen

Sign up to save your podcasts

How Intercom Cut $250K/Month by Ditching GPT for Qwen

How Intercom Cut $250K/Month by Ditching GPT for Qwen

More shows like Chain of Thought

The Daily