
Sign up to save your podcasts
Or
Large language models go through a lot of vetting before they’re released to the public. That includes safety tests, bias checks, ethical reviews and more. But what if, hypothetically, a model could dodge a safety question by lying to developers, hiding its real response to a safety test and instead giving the exact response its human handlers are looking for? A recent study shows that advanced LLMs are developing the capacity for deception, and that could bring that hypothetical situation closer to reality. Marketplace’s Lily Jamali speaks with Thilo Hagendorff, a researcher at the University of Stuttgart and the author of the study, about his findings.
4.5
12361,236 ratings
Large language models go through a lot of vetting before they’re released to the public. That includes safety tests, bias checks, ethical reviews and more. But what if, hypothetically, a model could dodge a safety question by lying to developers, hiding its real response to a safety test and instead giving the exact response its human handlers are looking for? A recent study shows that advanced LLMs are developing the capacity for deception, and that could bring that hypothetical situation closer to reality. Marketplace’s Lily Jamali speaks with Thilo Hagendorff, a researcher at the University of Stuttgart and the author of the study, about his findings.
1,634 Listeners
903 Listeners
4,334 Listeners
1,716 Listeners
8,637 Listeners
30,668 Listeners
1,367 Listeners
32,107 Listeners
2,171 Listeners
5,493 Listeners
1,451 Listeners
9,500 Listeners
3,589 Listeners
5,945 Listeners
6,217 Listeners
163 Listeners
2,775 Listeners
157 Listeners
1,343 Listeners
90 Listeners