Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why I'm working on AI welfare, published by kyle fish on July 6, 2024 on The Effective Altruism Forum.
Since Alvea wound down, AI moral patienthood/welfare/wellbeing[1] has been my primary professional/impact focus. I started exploring this topic casually last fall, then spent the winter investigating potential grantmaking opportunities and related strategy questions. Now, I'm working with Rob Long on a combination of research projects and engagement with leading AI labs on AI moral patienthood and related issues.
I'm approaching this topic as one component of the broader puzzle of making a future with transformative AI go well, rather than as an independent/standalone issue.
My overall focus is on building a clearer picture of relevant questions and issues, developing tools and recommendations for how labs and other stakeholders might navigate them, and improving the discourse and decision-making around this topic.
I'm not focused on general advocacy for digital minds - there are significant potential risks associated with both under- and over-attributing moral status to AI systems, which makes it important to keep the focus on getting the issues right, rather than prematurely pushing for particular conclusions.
For AI welfare debate week, I'll briefly share the core factors contributing to my prioritization of this topic:
1. Direct longtermist relevance: Decisions we make soon about the development, deployment, and treatment of AI systems could have lasting effects on the wellbeing of digital minds, with potentially enormous significance for the moral value of the future.
1. I put some weight on the simple argument that the vast majority of the moral value of the future may come from the wellbeing/experiences of digital beings, and that therefore we ought to work now to understand this issue and take relevant actions.
2. I don't think strong versions of this argument hold, given the possibility of deferring (certain kinds of) work on this topic until advanced AI systems can help or solve it on their own, the potentially overwhelming importance of alignment for existential security, and other factors. As such, I still rank alignment substantially higher than AI welfare in terms of overall importance from a longtermist standpoint.
3. However, I don't think the longtermist value of working on AI welfare now is entirely defeated by these concerns. It still seems plausible enough to me that there are important path dependencies due to e.g. the need to make critical decisions prior to the development of transformative AI, that some low single-digit percentage of the total resources going into alignment should be invested here as a baseline.
2. Synergies with other longtermist priorities: Work on the potential wellbeing of AI systems may be instrumentally/synergistically valuable to work on other longtermist priorities, particularly AI alignment and safety.
1. Understanding the potential interests, motivations, goals, and experiences of AI systems - a key aim of work on AI moral patienthood - seems broadly useful in efforts to build positive futures with such systems.
2. It seems plausible to me that AI welfare will become a major topic of public interest/concern, and that it will factor into many critical decisions about AI development/deployment in general. By default, I do not expect the reasoning and decision-making about this topic to be good - this seems like a place where early thoughtful efforts could make a big positive difference.
3. There are also potential tensions that arise between AI welfare and other longtermist priorities (e.g. the possibility that some AI safety/control strategies involve mistreating AI systems), but my current view is that these topics are, or at least can be, predominantly synergistic.
3. Near-term/moral decency considerations: Taking ...