October 16, 2025

A small number of samples can poison LLMs of any size

13 minutes

This white paper by Anthropic, UK AI Security Institute, and The Alan Turing Institute demonstrates that a small, fixed number of malicious documents—as few as 250—can successfully create a "backdoor" vulnerability in LLMs, regardless of the model's size or the total volume of clean training data. This finding challenges the previous assumption that attackers need to control a percentage of the training data, suggesting that these poisoning attacks are more practical and accessible than previously believed. The study specifically tested a denial-of-service attack that causes the model to output gibberish upon encountering a specific trigger phrase like , and the authors share these results to encourage further research into defenses against such vulnerabilities.

...more

View all episodes

By Enoch H. Kang

October 16, 2025

A small number of samples can poison LLMs of any size

13 minutes

...more

Share A small number of samples can poison LLMs of any size

Sign up to save your podcasts

A small number of samples can poison LLMs of any size

A small number of samples can poison LLMs of any size