June 18, 2023

LW - A summary of current work in AI governance by constructive

20 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A summary of current work in AI governance, published by constructive on June 17, 2023 on LessWrong.

A summary of current work in AI governance

If you’d like to learn more about AI governance, apply to the AI Safety Fundamentals: Governance Track, a 12-week, part-time fellowship before June 25.

Context

For the past nine months, I spent ~50% of my time upskilling in AI alignment and governance alongside my role as a research assistant in compute governance.

While I discovered great writing characterizing AI governance on a high level, few texts covered which work is currently ongoing. To improve my understanding of the current landscape, I began compiling different lines of work and made a presentation. People liked my presentation and suggested I could publish this as a blog post.

Disclaimers:

I’ve only started working in the field ~9 months ago

I haven’t run this by any of the organizations I am mentioning. My impression of their work is likely different from their intent behind it.

I’m biased toward the work by GovAI as I engage with that most.

My list is far from comprehensive.

What is AI governance?

Note that I am primarily discussing AI governance in the context of preventing existential risks.

Matthijs Maas defines AI long-term governance as

“The study and shaping of local and global governance systems—including norms, policies, laws, processes, politics, and institutions—that affect the research, development, deployment, and use of existing and future AI systems in ways that positively shape societal outcomes into the long-term future.”

Considering this, I want to point out:

AI governance is not just government policy, but involves a large range of actors. (In fact, the most important decisions in AI governance are currently being made at major AI labs rather than at governments.)

The field is broad. Rather than only preventing misalignment, AI governance is concerned with a variety of ways in which future AI systems could impact the long-term prospects of humanity.

Since "long-term" somewhat implies that those decisions are far away, another term used to describe the field is “governance of advanced AI systems.”

Threat Models

Researchers and policymakers in AI governance are concerned with a range of threat models from the development of advanced AI systems. For an overview, I highly recommend Allan Dafoe’s research agenda and Sam Clarke’s "Classifying sources of AI x-risk".

To illustrate this point, I will briefly describe some of the main threat models discussed in AI governance.

Feel free to skip right to the main part.

Takeover by an uncontrollable, agentic AI system

This is the most prominent threat model and the focus of most AI safety research. It focuses on the possibility that future AI systems may exceed humans in critical capabilities such as deception and strategic planning. If such models develop adversarial goals, they could attempt and succeed at permanently disempowering humanity.

Prominent examples of where this threat model has been articulated:

Is power-seeking AI an existential risk?, Joe Carlsmith, 2022

AGI Ruin: A list of lethalities, Eliezer Yudkowsky, 2022 (In a very strong form, see also this in-depth response from Paul Christiano)

The alignment problem from a deep learning perspective, Ngo et al., 2022

Loss of control through automation

Even if AI systems remain predominantly non-agentic, the increasing automation of societal and economic decision-making, driven by market incentives and corporate control, could pose the risk of humanity gradually losing control - e.g., if the optimized measures are only coarse proxies of what humans value and the complexity of emerging systems is incomprehensible to human decision-makers.

This threat model is somewhat harder to convey but has been articulated well in the following texts:

Will Humanit...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

June 18, 2023

LW - A summary of current work in AI governance by constructive

20 minutes

A summary of current work in AI governance

If you’d like to learn more about AI governance, apply to the AI Safety Fundamentals: Governance Track, a 12-week, part-time fellowship before June 25.

Context

For the past nine months, I spent ~50% of my time upskilling in AI alignment and governance alongside my role as a research assistant in compute governance.

Disclaimers:

I’ve only started working in the field ~9 months ago

I haven’t run this by any of the organizations I am mentioning. My impression of their work is likely different from their intent behind it.

I’m biased toward the work by GovAI as I engage with that most.

My list is far from comprehensive.

What is AI governance?

Note that I am primarily discussing AI governance in the context of preventing existential risks.

Matthijs Maas defines AI long-term governance as

Considering this, I want to point out:

The field is broad. Rather than only preventing misalignment, AI governance is concerned with a variety of ways in which future AI systems could impact the long-term prospects of humanity.

Since "long-term" somewhat implies that those decisions are far away, another term used to describe the field is “governance of advanced AI systems.”

Threat Models

To illustrate this point, I will briefly describe some of the main threat models discussed in AI governance.

Feel free to skip right to the main part.

Takeover by an uncontrollable, agentic AI system

Prominent examples of where this threat model has been articulated:

Is power-seeking AI an existential risk?, Joe Carlsmith, 2022

AGI Ruin: A list of lethalities, Eliezer Yudkowsky, 2022 (In a very strong form, see also this in-depth response from Paul Christiano)

The alignment problem from a deep learning perspective, Ngo et al., 2022

Loss of control through automation

This threat model is somewhat harder to convey but has been articulated well in the following texts:

Will Humanit...

...more

Share LW - A summary of current work in AI governance by constructive

Sign up to save your podcasts

LW - A summary of current work in AI governance by constructive

LW - A summary of current work in AI governance by constructive