
Sign up to save your podcasts
Or


Large language models go through a lot of vetting before they’re released to the public. That includes safety tests, bias checks, ethical reviews and more. But what if, hypothetically, a model could dodge a safety question by lying to developers, hiding its real response to a safety test and instead giving the exact response its human handlers are looking for? A recent study shows that advanced LLMs are developing the capacity for deception, and that could bring that hypothetical situation closer to reality. Marketplace’s Lily Jamali speaks with Thilo Hagendorff, a researcher at the University of Stuttgart and the author of the study, about his findings.
By Marketplace4.5
12561,256 ratings
Large language models go through a lot of vetting before they’re released to the public. That includes safety tests, bias checks, ethical reviews and more. But what if, hypothetically, a model could dodge a safety question by lying to developers, hiding its real response to a safety test and instead giving the exact response its human handlers are looking for? A recent study shows that advanced LLMs are developing the capacity for deception, and that could bring that hypothetical situation closer to reality. Marketplace’s Lily Jamali speaks with Thilo Hagendorff, a researcher at the University of Stuttgart and the author of the study, about his findings.

32,222 Listeners

30,674 Listeners

8,792 Listeners

934 Listeners

1,386 Listeners

1,652 Listeners

2,178 Listeners

5,486 Listeners

113,446 Listeners

56,951 Listeners

9,558 Listeners

10,331 Listeners

3,620 Listeners

6,103 Listeners

6,584 Listeners

6,463 Listeners

163 Listeners

2,992 Listeners

154 Listeners

1,385 Listeners

91 Listeners