
Sign up to save your podcasts
Or
In this episode, we dive into the multimodal AI agents, starting with the recent release of runner H and diving into groundbreaking research, including:
04:15 VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks by Jing Yu Koh et. al
19:18 AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations by Gaurav Verma et. al.
32:32 Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast by Xiangming Gu et. al.
In this episode, we dive into the multimodal AI agents, starting with the recent release of runner H and diving into groundbreaking research, including:
04:15 VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks by Jing Yu Koh et. al
19:18 AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations by Gaurav Verma et. al.
32:32 Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast by Xiangming Gu et. al.