The Nonlinear Library

AF - Is Deontological AI Safe? [Feedback Draft] by Dan H


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Is Deontological AI Safe? [Feedback Draft], published by Dan H on May 27, 2023 on The AI Alignment Forum.
[Note: This post is an excerpt from a longer paper, written during the first half of the Philosophy Fellowship at the Center for AI Safety. I (William D'Alessandro) am a Postdoctoral Fellow at the Munich Center for Mathematical Philosophy. Along with the other Philosophy Fellowship midterm projects, this draft is posted here for feedback.The full version of the paper includes a discussion of the conceptual relationship between safety and moral alignment, and an argument that we should choose a reliably safe powerful AGI over one that's (apparently) successfully morally aligned. I've omitted this material for length but can share it on request.The deontology literature is big, and lots of angles here could be developed further. Questions and suggestions much appreciated!]
1 Introduction
Value misalignment arguments for AI risk observe that artificial agents needn’t share human ideas about what sorts of ends are intrinsically good and what sorts of means are morally permissible. Without such values for guidance, a powerful AI might turn its capabilities toward human-unfriendly goals. Or it might pursue the objectives we’ve given it in dangerous and unforeseen ways. Thus, as Bostrom writes, “Unless the plan is to keep superintelligence bottled up forever, it will be necessary to master motivation selection” (Bostrom 2014, 185). Indeed, since more intelligent, autonomous AIs will be favored by competitive pressures over their less capable kin (Hendrycks 2023), the hope of keeping AI weak indefinitely is probably no plan at all.
Considerations about value misalignment plausibly show that equipping AIs with something like human morality is a necessary step toward AI safety. It’s natural to wonder whether moral alignment might also be sufficient for safety, or nearly so. Would an AI guided by an appropriate set of ethical principles be unlikely to cause disastrous harm by default?
This is a tempting thought. By the lights of common sense, morality is strongly linked with trustworthiness and beneficence; we think of morally exemplary agents as promoting human flourishing while doing little harm. And many moral systems include injunctions along these lines in their core principles. It would be convenient if this apparent harmony turned out to be a robust regularity.
Deontological morality looks like an especially promising candidate for an alignment target in several respects. It’s perhaps the most popular moral theory among both professional ethicists and the general public. It looks to present a relatively tractable technical challenge in some respects, as well-developed formal logics of deontic inference exist already, and large language models have shown promise at classifying acts into deontologically relevant categories (Hendrycks et al. 2021). Correspondingly, research has begun on equipping AIs with deontic constraints via a combination of top-down and bottom-up methods (Kim et al. 2021). Finally, deontology appears more inherently safety-friendly than its rivals, since many deontological theories posit strong harm-avoidance principles. (By contrast, standard forms of consequentialism recommend taking unsafe actions when such acts maximize expected utility.
Adding features like risk-aversion and future discounting may mitigate some of these safety issues, but it’s not clear they solve them entirely.)
I’ll argue that, unfortunately, deontological morality is no royal road to safe AI. The problem isn’t just the trickiness of achieving complete alignment, and the chance that partially aligned AIs will exhibit risky behavior. Rather, there’s reason to think that deontological AI might pose distinctive safety risks of its own. This suggests that existential catastrophe...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings