
Sign up to save your podcasts
Or


The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.
Chapters
00:00:24 Growing up at OpenAI
00:03:10 Why reasoning changed everything
00:06:28 What made o1 surprising
00:11:20 Why old benchmarks stopped working
00:14:45 What makes a good benchmark
00:17:35 Why evals are getting harder
00:22:09 Measuring voice and vision models
00:24:48 Testing models on real science
00:33:23 How OpenAI tracks frontier progress
00:40:47 What AI means for work
Hosted on Acast. See acast.com/privacy for more information.
By OpenAI4.4
5858 ratings
The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.
Chapters
00:00:24 Growing up at OpenAI
00:03:10 Why reasoning changed everything
00:06:28 What made o1 surprising
00:11:20 Why old benchmarks stopped working
00:14:45 What makes a good benchmark
00:17:35 Why evals are getting harder
00:22:09 Measuring voice and vision models
00:24:48 Testing models on real science
00:33:23 How OpenAI tracks frontier progress
00:40:47 What AI means for work
Hosted on Acast. See acast.com/privacy for more information.

1,096 Listeners

345 Listeners

229 Listeners

208 Listeners

314 Listeners

99 Listeners

577 Listeners

508 Listeners

143 Listeners

99 Listeners

227 Listeners

686 Listeners

54 Listeners

33 Listeners

159 Listeners