Sign up to save your podcastsEmail addressPasswordRegisterOrContinue with GoogleAlready have an account? Log in here.
May 24, 2026AgentFloor: A Benchmark for Long-Horizon Agent Planning35 minutesPlayA 30-task benchmark for evaluating long-horizon planning capabilities across 16 different AI models....moreShareView all episodesBy Shaoqing TanMay 24, 2026AgentFloor: A Benchmark for Long-Horizon Agent Planning35 minutesPlayA 30-task benchmark for evaluating long-horizon planning capabilities across 16 different AI models....more
May 24, 2026AgentFloor: A Benchmark for Long-Horizon Agent Planning35 minutesPlayA 30-task benchmark for evaluating long-horizon planning capabilities across 16 different AI models....more