Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Open Agency Model, published by Eric Drexler on February 22, 2023 on The AI Alignment Forum.
Notes on AI for complex, consequential problems
Eric DrexlerCentre for the Governance of AIUniversity of Oxford
Introduction
This document argues for “open agencies” — not opaque, unitary agents — as the appropriate model for applying future AI capabilities to consequential tasks that call for combining human guidance with delegation of planning and implementation to AI systems. This prospect reframes and can help to tame a wide range of classic AI safety challenges, leveraging alignment techniques in a relatively fault-tolerant context.
Rethinking safe AI and its applications
AI safety research is too varied to summarize, yet broad patterns are obvious. A long-established reference-problem centers on prospects for rational superintelligent agents that pursue narrow goals with potentially catastrophic outcomes. This frame has been productive, but developments in deep learning call for updates that take account of the proliferation of narrow models (for driving, coding, robot control, image generation, game playing.) that are either non-agentic or act as agents in only a narrow sense, and that take account of the rise of more broadly capable foundation models and LLMs. These updates call for reframing questions of AI safety, and call for attention to how consequential tasks might be accomplished by organizing AI systems that usually do approximately what humans intend.
Two frames for high-level AI
The unitary-agent frame
From its beginnings in popular culture, discussion of the AI control problem has centered around a unitary agent model of high-level AI and potential AI risks. In this model, a potentially dominant agent both plans and acts to achieve its goals.
The unitary-agent model typically carries assumptions regarding goals, plans, actions, and control.
Goals: Internal to an agent, by default including power-seeking goals
Plans: Internal to an agent, possibly uninterpretable and in effect secret
Actions: Performed by the agent, possibly intended to overcome opposition
Control: Humans confront a powerful, potentially deceptive agent
The typical unitary-agent threat model contemplates the emergence of a dominant, catastrophically misaligned agent, and safety models implicitly or explicitly call for deploying a dominant agent (or an equivalent collective system) that is both aligned and powerful enough to suppress unaligned competitors everywhere in the world.
The open-agency frame
Recent developments suggest an alternative open agency model of high-level AI. Today, the systems that look most like AGI are large language models (LLMs), and these are not agents that seek goals, but are generative models that produce diverse outputs in response to prompts (in a generalized sense) and random-number seeds. Most outputs are discarded.
Trained on prediction tasks, LLMs learn world models that include agent behaviors, and generative models that are similar in kind can be informed by better world models and produce better plans. There is no need to assume LLM-like implementations: The key point is that generation of diverse plans is by nature a task for generative models, and that in routine operation, most outputs are discarded.
These considerations suggest an “open-agency frame” in which prompt-driven generative models produce diverse proposals, diverse critics help select proposals, and diverse agents implement proposed actions to accomplish tasks (with schedules, budgets, accountability mechanisms, and so forth).
Goals, plans, actions, and control look different in the open-agency model:
Goals: Are provided as prompts to diverse generative models, yielding diverse plans on request
Plans: Are selected with the aid of diverse, independent comparison and evaluation mechanisms
...