January 02, 2023

AF - [Simulators seminar sequence] #1 Background & shared assumptions by Jan Hendrik Kirchner

6 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Simulators seminar sequence] #1 Background & shared assumptions, published by Jan Hendrik Kirchner on January 2, 2023 on The AI Alignment Forum.

Meta: Over the past few months, we've held a seminar series on the Simulators theory by janus. As the theory is actively under development, the purpose of the series is to discover central structures and open problems. Our aim with this sequence is to share some of our discussions with a broader audience and to encourage new research on the questions we uncover. Below, we outline the broader rationale and shared assumptions of the participants of the seminar.

Shared assumptions

Going into the seminar series, we determined a number of assumptions that we share. The degree to which each participant subscribes to each assumption varies, but we agreed to postpone discussions on these topics to have a maximally productive seminar. This restriction does not apply to the reader of this post, so please feel free to question our assumptions.

Aligning AI is a crucial task that needs to be addressed as AI systems rapidly become more capable.

(Probably a rather uncontroversial assumption for readers of this Forum, but worth stating explicitly.)

The core part of the alignment problem involves "deconfusion research."

We do not work on deconfusion for the sake of deconfusion but in order to engineer concepts, identify unknown unknowns, and transition from philosophy to mathematics to algorithms to implementation.

The problem is complex because we have to reason about something that doesn't yet exist.

AGI is going to be fundamentally different from anything we have ever known and will thus present us with challenges that are very hard to predict. We might only have a very narrow window of opportunity to perform critical actions and might not get the chance to iterate on a solution.

However, this does not mean that we should ignore evidence as it emerges.

It is essential to carefully consider the GPT paradigm as it is being developed and implemented. At this point, it appears to us more plausible than not that GPT will be a core component of AGI.

One feasible-seeming approach is "accelerating alignment," which involves leveraging AI as it is developed to help solve the challenging problems of alignment.

This is not a novel idea, as it's related to previously suggested concepts such as seed AI, nanny AI, and iterated amplification and distillation (IDA).

Simulators refresher

Going into the seminar series, we had all read the original Simulators post by janus. We recommend reading the post in the original but provide a condensed summary as a refresher below.

A fruitful way to think about GPT is

GPT is a simulator (i.e. a model trained with predictive loss on a self-supervised dataset)

The entities simulated by GPT are simulacra (agentic or non-agentic; different objective than the simulator)

The simulator terminology has appropriate connotations

GPT is not (per se) an oracle, genie, agentic, .

all GPT “cares about” is simulating/modeling the training distribution

log-loss is a proper scoring rule

Solving alignment with simulators

While much of this sequence will focus on the details and consequences of simulator theory, we want to clearly state at the outset that we do this work with the goal of contributing to a solution to the alignment problem. In this section, we briefly outline how we might expect simulator theory to concretely contribute to such a solution.

One slightly naive approach to solving the alignment is to use a strong, GPT-like model as a simulator, prompt it with "the solution to the alignment problem is", to cross your fingers and hit enter. The list of ways in which this approach fails is hard to exhaust and includes the model's tendency to hallucinate, the generally weak reasoning ability, as well as more fundamental issue...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings