This research paper investigates the vulnerabilities of vision-language models (VLMs) used to power computer agents. The authors demonstrate that these agents can be easily manipulated by carefully crafted adversarial pop-ups, causing them to click on these malicious elements instead of performing their intended tasks. This attack successfully diverts the agents from their intended actions in over 80% of cases, significantly reducing their task success rate. The authors explore various attack design elements and find that basic defense strategies, such as asking the agent to ignore pop-ups or including an advertisement notice, are ineffective. They conclude that more robust agent systems are needed to ensure safe agent workflow in real-world computer environments.