Embodied AI 101

Claw-Eval: Toward Trustworthy and Transparent Evaluation of Autonomous Agents


Listen Later

Benchmark with 2,159 rubric items across 300 tasks using trajectory-aware grading and 3-trial Pass^3 scoring to mitigate luck. Evaluates agent reliability in real-world robotics settings.
...more
View all episodesView all episodes
Download on the App Store

Embodied AI 101By Shaoqing Tan