
Sign up to save your podcasts
Or


Edwin Chen is the founder and CEO of Surge AI, the data infrastructure company behind nearly every major frontier model. Surge works with OpenAI, Anthropic, Meta, and Google, providing the high-quality data and evaluation infrastructure that powers their models.
Edwin reveals why optimizing for popular benchmarks like LMArena is "basically optimizing for clickbait," how one frontier lab's models regressed for 6-12 months without anyone knowing, and why the industry's approach to measurement is fundamentally broken. Jacob and Edwin discuss what actually makes elite AI evaluators, why "there's never going to be a one size fits all solution" for AI models, and how frontier labs are taking surprisingly divergent paths to AGI.
(0:00) Intro
(0:56) The Pitfalls of Optimizing for LMArena
(4:34) Issues with Data Quality and Measurement
(9:44) The Importance of Human Evaluations
(13:40) The Rise of RL Environments
(17:21) Challenges and Lessons in Model Training
(19:59) Silicon Valley's Pivot Culture
(23:06) Technology-Driven Approach
(24:18) Quality Beyond Credentials
(27:51) Impact of Scale Acquisition
(28:35) Hiring for Research Culture
(30:48) Divergence in AI Training Paradigms
(34:16) Future of AI Models
(39:32) Multimodal AI and Quality
(43:44) Quickfire
With your co-hosts:
@jacobeffron
- Partner at Redpoint, Former PM Flatiron Health
@patrickachase
- Partner at Redpoint, Former ML Engineer LinkedIn
@ericabrescia
- Former COO Github, Founder Bitnami (acq’d by VMWare)
@jordan_segall
- Partner at Redpoint
By by Redpoint Ventures4.9
5151 ratings
Edwin Chen is the founder and CEO of Surge AI, the data infrastructure company behind nearly every major frontier model. Surge works with OpenAI, Anthropic, Meta, and Google, providing the high-quality data and evaluation infrastructure that powers their models.
Edwin reveals why optimizing for popular benchmarks like LMArena is "basically optimizing for clickbait," how one frontier lab's models regressed for 6-12 months without anyone knowing, and why the industry's approach to measurement is fundamentally broken. Jacob and Edwin discuss what actually makes elite AI evaluators, why "there's never going to be a one size fits all solution" for AI models, and how frontier labs are taking surprisingly divergent paths to AGI.
(0:00) Intro
(0:56) The Pitfalls of Optimizing for LMArena
(4:34) Issues with Data Quality and Measurement
(9:44) The Importance of Human Evaluations
(13:40) The Rise of RL Environments
(17:21) Challenges and Lessons in Model Training
(19:59) Silicon Valley's Pivot Culture
(23:06) Technology-Driven Approach
(24:18) Quality Beyond Credentials
(27:51) Impact of Scale Acquisition
(28:35) Hiring for Research Culture
(30:48) Divergence in AI Training Paradigms
(34:16) Future of AI Models
(39:32) Multimodal AI and Quality
(43:44) Quickfire
With your co-hosts:
@jacobeffron
- Partner at Redpoint, Former PM Flatiron Health
@patrickachase
- Partner at Redpoint, Former ML Engineer LinkedIn
@ericabrescia
- Former COO Github, Founder Bitnami (acq’d by VMWare)
@jordan_segall
- Partner at Redpoint

530 Listeners

1,097 Listeners

2,355 Listeners

225 Listeners

10,015 Listeners

97 Listeners

523 Listeners

502 Listeners

133 Listeners

93 Listeners

471 Listeners

35 Listeners

121 Listeners

42 Listeners

52 Listeners