
Sign up to save your podcasts
Or


In this episode, we dive into the multimodal AI agents, starting with the recent release of runner H and diving into groundbreaking research, including:
04:15 VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks by Jing Yu Koh et. al
19:18 AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations by Gaurav Verma et. al.
32:32 Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast by Xiangming Gu et. al.
By The Agents of Tomorrow ShowIn this episode, we dive into the multimodal AI agents, starting with the recent release of runner H and diving into groundbreaking research, including:
04:15 VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks by Jing Yu Koh et. al
19:18 AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations by Gaurav Verma et. al.
32:32 Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast by Xiangming Gu et. al.