Redwood Research Blog

“Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models” by Anders Cairns Woodruff


Listen Later

Subtitle: Models' no-CoT time horizon has doubled roughly every year.

by Francis Rhys Ward, Dewi Gould, Anders Cairns Woodruff et al. (see full author list at the end)

PAPER LINK

About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain of thought (CoT)?

If models can do extensive reasoning without outputting any CoT, it would have implications for safety. Developers and deployment-time monitors couldn’t easily understand models’ motivations and catch dangerous planning. Models that reason substantially without a CoT might also drift further from human patterns of thought, since their reasoning is no longer constrained by text in the pretraining prior. As a result, they would be harder to understand and might be more likely to scheme.

Extending Ryan Greenblatt's research, we investigate this by measuring models’ ability to complete tasks without any CoT on a suite of 43 benchmarks spanning different domains. We compare AI reasoning ability to humans using the estimated 50% time horizon (TH)---the typical time taken for a human to perform a task that the LLM performs with [...]

---

Outline:

(02:28) Methods

(04:59) Results

(06:34) FAQ

(08:09) Conclusion

---

First published:

June 10th, 2026

Source:

https://blog.redwoodresearch.org/p/estimating-no-cot-task-completion

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

Redwood Research BlogBy Redwood Research