
Sign up to save your podcasts
Or


Large language models go through a lot of vetting before they’re released to the public. That includes safety tests, bias checks, ethical reviews and more. But what if, hypothetically, a model could dodge a safety question by lying to developers, hiding its real response to a safety test and instead giving the exact response its human handlers are looking for? A recent study shows that advanced LLMs are developing the capacity for deception, and that could bring that hypothetical situation closer to reality. Marketplace’s Lily Jamali speaks with Thilo Hagendorff, a researcher at the University of Stuttgart and the author of the study, about his findings.
By Marketplace4.4
7777 ratings
Large language models go through a lot of vetting before they’re released to the public. That includes safety tests, bias checks, ethical reviews and more. But what if, hypothetically, a model could dodge a safety question by lying to developers, hiding its real response to a safety test and instead giving the exact response its human handlers are looking for? A recent study shows that advanced LLMs are developing the capacity for deception, and that could bring that hypothetical situation closer to reality. Marketplace’s Lily Jamali speaks with Thilo Hagendorff, a researcher at the University of Stuttgart and the author of the study, about his findings.

30,609 Listeners

8,801 Listeners

941 Listeners

1,390 Listeners

1,290 Listeners

3,228 Listeners

1,713 Listeners

9,724 Listeners

1,649 Listeners

5,480 Listeners

113,121 Listeners

1,448 Listeners

9,556 Listeners

10 Listeners

35 Listeners

5,576 Listeners

16,525 Listeners