
Sign up to save your podcasts
Or


Artificial intelligence is advancing rapidly, but our ability to measure what these systems can actually do—and the risks they may pose—has lagged behind. Headline benchmarks and viral demos offer snapshots of a system's performance, but they say little about how AI behaves in complex real-world settings or how much autonomy models can sustain over time. As these systems take on more consequential roles, the challenge is not just building more powerful models, but developing credible ways to evaluate their capabilities and limits.
Chris Painter, president of Model Evaluation and Threat Research (METR), joins Oren to discuss how researchers are building new frameworks to assess AI systems and what those efforts reveal about the trajectory of machine intelligence. They explore “time horizon” as a measure of autonomy, the difficulty of evaluating alignment and sabotage risks, and the constraints posed by compute and organizational bottlenecks. They also consider what it will look like when AI systems begin contributing even more to their own development and their capabilities outpace our ability to measure them.
By American Compass4.5
6161 ratings
Artificial intelligence is advancing rapidly, but our ability to measure what these systems can actually do—and the risks they may pose—has lagged behind. Headline benchmarks and viral demos offer snapshots of a system's performance, but they say little about how AI behaves in complex real-world settings or how much autonomy models can sustain over time. As these systems take on more consequential roles, the challenge is not just building more powerful models, but developing credible ways to evaluate their capabilities and limits.
Chris Painter, president of Model Evaluation and Threat Research (METR), joins Oren to discuss how researchers are building new frameworks to assess AI systems and what those efforts reveal about the trajectory of machine intelligence. They explore “time horizon” as a measure of autonomy, the difficulty of evaluating alignment and sabotage risks, and the constraints posed by compute and organizational bottlenecks. They also consider what it will look like when AI systems begin contributing even more to their own development and their capabilities outpace our ability to measure them.

1,993 Listeners

2,461 Listeners

5,181 Listeners

4,909 Listeners

6,623 Listeners

2,039 Listeners

7,244 Listeners

1,231 Listeners

2,438 Listeners

10,254 Listeners

720 Listeners

818 Listeners

8,447 Listeners

457 Listeners

147 Listeners