
Sign up to save your podcasts
Or
How safe are large language models like ChatGPT and Google’s Gemini? In this episode, we dive into groundbreaking research on AI safety and explore Harbench, a powerful new tool designed to stress-test LLMs against harmful manipulation. With 18 different attack methods tested across 33 models, this study reveals surprising vulnerabilities—and promising solutions. We break down red teaming, contextual attacks, and the innovative R2-D2 defense system that could make AI more resilient. Can LLMs ever be truly safe? Join us as we tackle the risks, defenses, and ethical responsibilities shaping the future of AI.
Link: https://arxiv.org/pdf/2402.04249
How safe are large language models like ChatGPT and Google’s Gemini? In this episode, we dive into groundbreaking research on AI safety and explore Harbench, a powerful new tool designed to stress-test LLMs against harmful manipulation. With 18 different attack methods tested across 33 models, this study reveals surprising vulnerabilities—and promising solutions. We break down red teaming, contextual attacks, and the innovative R2-D2 defense system that could make AI more resilient. Can LLMs ever be truly safe? Join us as we tackle the risks, defenses, and ethical responsibilities shaping the future of AI.
Link: https://arxiv.org/pdf/2402.04249