February 17, 2025

AI Under Attack: How Harbench Tests LLM Safety

10 minutes

How safe are large language models like ChatGPT and Google’s Gemini? In this episode, we dive into groundbreaking research on AI safety and explore Harbench, a powerful new tool designed to stress-test LLMs against harmful manipulation. With 18 different attack methods tested across 33 models, this study reveals surprising vulnerabilities—and promising solutions. We break down red teaming, contextual attacks, and the innovative R2-D2 defense system that could make AI more resilient. Can LLMs ever be truly safe? Join us as we tackle the risks, defenses, and ethical responsibilities shaping the future of AI.

Link: https://arxiv.org/pdf/2402.04249

...more

View all episodes

By j15

February 17, 2025

AI Under Attack: How Harbench Tests LLM Safety

10 minutes

Link: https://arxiv.org/pdf/2402.04249

...more

Share AI Under Attack: How Harbench Tests LLM Safety

Sign up to save your podcasts

AI Under Attack: How Harbench Tests LLM Safety

AI Under Attack: How Harbench Tests LLM Safety