Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AGI will have learnt utility functions, published by Beren Millidge on January 25, 2023 on The AI Alignment Forum.
This post is part of the work done at Conjecture.
Thanks to Eric Winsor, Daniel Braun, Andrea Miotti and Connor Leahy for helpful comments and feedback on the draft versions of this post.
There has been a lot of debate and discussion recently in the AI safety community about whether AGI will likely optimize for fixed goals or be a wrapper mind. The term wrapper mind is largely a restatement of the old idea of a utility maximizer, with AIXI as a canonical example. The fundamental idea is that there is an agent with some fixed utility function which it maximizes without any kind of feedback which can change its utility function.
Rather, the optimization process is assumed to be 'wrapped around' some core and unchanging utility function. The capabilities core of the agent is also totally modular and disjoint from the utility function such that arbitrary planners and utility functions can be composed so long as they have the right I/O interfaces. The core 'code' of an AIXI like agent is incredibly simple and, for instance, could be implemented in this Python pseudocode :
def action_perception_loop(): while True: observation = self.sensors.get_observation() state = self.update_state(self.current_state, observation) all_action_plans = self.generate_action_plans(state) all_trajectories = self.world_model.generate_all_trajectories(all_action_plans, state) optimal_plan, optimal_utility = self.evaluate_trajectories(all_trajectories) self.execute(optimal_plan)
There's a couple of central elements to this architecture which must be included in any AIXI-like architecture. The AGI needs some sensorimotor equipment to both sense the world and execute its action plans. It needs a Bayesian filtering component to be able to update its representation of the world state given new observations and its current state. It needs a world model that can generate sets of action plans and then generate 'rollouts' which are simulations of likely futures given an action plan. Finally, it needs a utility function that can calculate the utility of different simulated trajectories into the future and pick the best one. Let's zoom in on this component a little more and see how the evaluate_trajectories function might look inside. It might look like this:
Essentially, the AIXI agent just takes all trajectories and ranks them according to its utility function and then picks the best one to execute. The fundamental problem with such an architecture, which is severely underappreciated, is that it implicitly assumes a utility oracle. That is, there exists some function self.utility_function() which is built into the agent from the beginning which can assign a consistent utility value to arbitrary world-states.
While conceptually simple, my argument is that actually designing and building such a function into an agent to achieve a specific and complex goal in the external world is incredibly difficult or impossible for agents pursuing sufficiently complex goals and operating in sufficiently complex environments. This includes almost all goals humans are likely to want to program an AGI with. This means that in practice we cannot construct AIXI-like agents that optimize for arbitrary goals in the real world, and that any agent we do build must utilize some kind of learned utility model. Specifically, this is a utility (or reward) function uθ(x) where θ is some set of parameters and x is some kind of state, where the utility function is learned by some learning process (typically supervised learning) against a dataset of state, utility pairs that are provided either by the environment or by human designers.
What this means is that, unlike a wrapper mind, the agent’s utility function can be influe...