AIandBlockchain

AI Under Attack: How Harbench Tests LLM Safety


Listen Later

How safe are large language models like ChatGPT and Google’s Gemini? In this episode, we dive into groundbreaking research on AI safety and explore Harbench, a powerful new tool designed to stress-test LLMs against harmful manipulation. With 18 different attack methods tested across 33 models, this study reveals surprising vulnerabilities—and promising solutions. We break down red teamingcontextual attacks, and the innovative R2-D2 defense system that could make AI more resilient. Can LLMs ever be truly safe? Join us as we tackle the risks, defenses, and ethical responsibilities shaping the future of AI.


Link: https://arxiv.org/pdf/2402.04249

...more
View all episodesView all episodes
Download on the App Store

AIandBlockchainBy j15