AI Insiders

Attributing Adversarial Attacks by Large Language Models: A Theoretical and Practical Analysis


Listen Later

Imagine you have a bunch of different robots that can all write stories. These robots are called Large Language Models (or LLMs for short). Now, here's the tricky part that scientists are trying to figure out:
If someone uses one of these robots to write something bad (like a mean message or a fake story), it's really, really hard to figure out which robot actually wrote it! It's kind of like trying to figure out which printer printed a letter when all the printers use the same kind of ink and paper.
Here's why it's so difficult:
There are LOTS of these writing robots out there (like having hundreds of different printers)
They all write in similar ways (just like how different printers make letters look almost the same)
People can teach these robots new tricks (like changing how a printer works), which makes it even harder to tell them apart
Think about it like this: Imagine you're playing a game where you have to guess who wrote a note in your class. Now imagine that:
Everyone uses the same type of pencil
They all learned to write from the same teacher
They can change their handwriting style
Pretty tough to guess, right? That's exactly what's happening with these AI robots!
The scientists in this paper are saying that it's not just hard to figure out which robot wrote something - sometimes it's impossible! Even if you had the world's fastest computer trying to solve this puzzle, it would take too long to figure it out.
Why is this important? Because we need to find other ways to make sure these robots are used safely and responsibly, since we can't always tell who's using them to do bad things.
...more
View all episodesView all episodes
Download on the App Store

AI InsidersBy Ronald Soh