The Nonlinear Library

By The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio conte... more

· Education

4.6

88 ratings

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about The Nonlinear Library:

How many episodes does The Nonlinear Library have?

The podcast currently has 9,862 episodes available.

The Nonlinear Library episodes:

March 30, 2023 EA - AI and Evolution by Dan H
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI and Evolution, published by Dan H on March 30, 2023 on The Effective Altruism Forum.
Executive Summary
Artificial intelligence is advancing quickly. In some ways, AI development is an uncharted frontier, but in others, it follows the familiar pattern of other competitive processes; these include biological evolution, cultural change, and competition between businesses. In each of these, there is significant variation between individuals structures and some are copied more than others, with the result that the future population is more similar to the most copied individuals of the earlier generation. In this way, species evolve, cultural ideas are transmitted across generations, and successful businesses are imitated while unsuccessful ones disappear.
This paper argues that these same selection patterns will shape AI development and that the features that will be copied the most are likely to create an AI population that is dangerous to humans. As AIs become faster and more reliable than people at more and more tasks, businesses that allow AIs to perform more of their work will outperform competitors still using human labor at any stage, just as a modern clothing company that insisted on using only manual looms would be easily outcompeted by those that use industrial looms. Companies will need to increase their reliance on AIs to stay competitive, and the companies that use AIs best will dominate the marketplace. This trend means that the AIs most likely to be copied will be very efficient at achieving their goals autonomously with little human intervention.
A world dominated by increasingly powerful, independent, and goal-oriented AIs is dangerous. Today, the most successful AI models are not transparent, and even their creators do not fully know how they work or what they will be able to do before they do it. We know only their results, not how they arrived at them. As people give AIs the ability to act in the real world, the AIs’ internal processes will still be inscrutable: we will be able to measure their performance only based on whether or not they are achieving their goals. This means that the AIs humans will see as most successful — and therefore the ones that are copied — will be whichever AIs are most effective at achieving their goals, even if they use harmful or illegal methods, as long as we do not detect their bad behavior.
In natural selection, the same pattern emerges: individuals are cooperative or even altruistic in some situations, but ultimately, strategically selfish individuals are best able to propagate. A business that knows how to steal trade secrets or deceive regulators without getting caught will have an edge over one that refuses to ever engage in fraud on principle. During a harsh winter, an animal that steals food from others to feed its own children will likely have more surviving offspring. Similarly, the AIs that succeed most will be those able to deceive humans, seek power, and achieve their goals by any means necessary.
If AI systems are more capable than we are in many domains and tend to work toward their goals even if it means violating our wishes, will we be able to stop them? As we become increasingly dependent on AIs, we may not be able to stop AI’s evolution. Humanity has never before faced a threat that is as intelligent as we are or that has goals. Unless we take thoughtful care, we could find ourselves in the position faced by wild animals today: most humans have no particular desire to harm gorillas, but the process of harnessing our intelligence toward our own goals means that they are at risk of extinction, because their needs conflict with human goals.
This paper proposes several steps we can take to combat selection pressure and avoid that outcome. We are optimistic that if we are careful and prudent, we can ensur...
...more
5min
March 30, 2023 LW - On the FLI Open Letter by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the FLI Open Letter, published by Zvi on March 30, 2023 on LessWrong.
The Future of Life Institute (FLI) recently put out an open letter, calling on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.
There was a great flurry of responses, across the spectrum. Many were for it. Many others were against it. Some said they signed, some said they decided not to. Some gave reasons, some did not. Some expressed concerns it would do harm, some said it would do nothing. There were some concerns about fake signatures, leading to a pause while that was addressed, which might have been related to the letter being released slightly earlier than intended.
Eliezer Yudkowsky put out quite the letter in Time magazine. In it, he says the FLI letter discussed in this post is a step in the right direction and he is glad people are signing it, but he will not sign because he does not think it goes far enough, that a 6 month pause is woefully insufficient, and he calls for. a lot more. I will address that letter more in a future post. I’m choosing to do this one first for speed premium. As much as the world is trying to stop us from saying it these days. one thing at a time.
The call is getting serious play. Here is Fox News, saying ‘Democrats and Republicans coalesce around calls to regulate AI development: ‘Congress has to engage.’
As per the position he staked out a few days prior and that I respond to here, Tyler Cowen is very opposed to a pause, and wasted no time amplifying every voice available in the opposing camp, handily ensuring I did not miss any.
Structure of this post is:
I Wrote a Letter to the Postman: Reproduces the letter in full.
You Know Those are Different, Right?: Conflation of x-risk vs. safety.
The Six Month Pause: What it can and can’t do.
Engage Safety Protocols: What would be real protocols?
Burden of Proof: The letter’s threshold for approval seems hard to meet.
New Regulatory Authority: The call for one.
Overall Take: I am net happy about the letter.
Some People in Favor: A selection.
Some People in Opposition: Including their reasoning, and complication of the top arguments, some of which seem good, some of which seem not so good.
Conclusion: Summary and reminder about speed premium conditions.
I Wrote a Letter to the Postman
First off, let’s read the letter. It’s short, so what the hell, let’s quote the whole thing.
AI systems with human-competitive intelligence can pose profound risks to society and humanity, as shown by extensive research[1] and acknowledged by top AI labs.[2] As stated in the widely-endorsed Asilomar AI Principles, Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources. Unfortunately, this level of planning and management is not happening, even though recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control.
Contemporary AI systems are now becoming human-competitive at general tasks,[3] and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization? Such decisions must not be delegated to unelected tech leaders. Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable. This confidence must be well justified and increase with the magnitude of a system’s potential effects.
OpenAI’s recent stateme...
...more
34min
March 30, 2023 LW - Othello-GPT: Future Work I Am Excited About by Neel Nanda
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Othello-GPT: Future Work I Am Excited About, published by Neel Nanda on March 29, 2023 on LessWrong.
This is the second in a three post sequence about interpreting Othello-GPT. See the first post for context.
This post covers future directions I'm excited to see work on, why I care about them, and advice to get started. Each section is self-contained, feel free to skip around.
Look up unfamiliar terms here
Future work I am excited about
The above sections leaves me (and hopefully you!) pretty convinced that I've found something real and dissolved the mystery of whether there's a linear vs non-linear representation. But I think there's a lot of exciting mysteries left to uncover in Othello-GPT, and that doing so may be a promising way to get better at reverse-engineering LLMs (the goal I actually care about). In the following sections, I try to:
Justify why I think further work on Othello-GPT is interesting
(Note that my research goal here is to get better at transformer mech interp, not to specifically understand emergent world models better)
Discuss how this unlocks finding modular circuits, and some preliminary results
Rather than purely studying circuits mapping input tokens to output logits (like basically all prior transformer circuits work), using the probe we can study circuits mapping the input tokens to the world model, and the world model to the output logits - the difference between thinking of a program as a massive block of code vs being split into functions and modules.
If we want to reverse-engineer large models, I think we need to get good at this!
Discuss how we can interpret Othello-GPT's neurons - we're very bad at interpreting transformer MLP neurons, and I think that Othello-GPT's are simple enough to be tractable yet complex enough to teach us something!
Discuss how, more broadly, Othello-GPT can act as a laboratory to get data on many other questions in transformer circuits - it's simple enough to have a ground truth, yet complex enough to be interesting
My hope is that some people reading this are interested enough to actually try working on these problems, and I end this section with advice on where to start.
Why and when to work on toy models
This is a long and rambly section about my research philosophy of mech interp, and you should feel free to move on to the next section if that's not your jam
At first glance, playing legal moves in Othello (not even playing good moves!) has nothing to do with language models, and I think this is a strong claim worth justifying. Can working on toy tasks like Othello-GPT really help us to reverse-engineer LLMs like GPT-4? I'm not sure! But I think it's a plausible bet worth making.
To walk through my reasoning, it's worth first thinking on what's holding us back - why haven't we already reverse-engineered the most capable models out there? I'd personally point to a few key factors (though note that this is my personal hot take, is not comprehensive, and I'm sure other researchers have their own views!):
Conceptual frameworks: To reverse-engineer a transformer, you need to know how to think like a transformer. Questions like: What kinds of algorithms is it natural for a transformer to represent, and how? Are features and circuits the right way to think about it? Is it even reasonable to expect that reverse-engineering is possible? How can we tell if a hypothesis or technique is principled vs hopelessly confused? What does it even mean to have truly identified a feature or circuit?
I personally thought A Mathematical Framework significantly clarified my conceptual frameworks for transformer circuits!
This blog post is fundamentally motivated by forming better conceptual frameworks - do models form linear representations?
Practical Knowledge/Techniques: Understanding models is hard, and being able to do this...
...more
51min
March 30, 2023 AF - Imitation Learning from Language Feedback by Jérémy Scheurer
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Imitation Learning from Language Feedback, published by Jérémy Scheurer on March 30, 2023 on The AI Alignment Forum.
TL;DR: Specifying the intended behavior of language models is hard, and current methods, such as RLHF, only incorporate low-resolution (binary) feedback information. To address this issue, we introduce Imitation learning from Language Feedback (ILF), an iterative algorithm leveraging language feedback as an information-rich and natural way of guiding a language model toward desired outputs. We showcase the effectiveness of our algorithm in two papers on the task of summary writing (Scheurer et al. 2023) and code generation (Chen et al. 2023). We discuss how language feedback can be used for process-based supervision and to guide model exploration, potentially enabling improved safety over RLHF. Finally, we develop theory showing that our algorithm can be viewed as Bayesian Inference, just like RLHF, which positions it as a competitive alternative to RLHF while having the potential safety benefits of predictive models.
We propose an iterative algorithm called Imitation learning from Language Feedback (ILF) that leverages language feedback to train language models to generate text that (outer-) aligns with human preferences. The algorithm assumes access to an initial LM which generates an output given a specific input. A human then provides language feedback on the input-output pair. The language feedback is not restricted in any way and can highlight issues, suggest improvements, or even acknowledge positive aspects of the output. ILF then proceeds in three steps:
Generate multiple refinements of the initial LM-generated output given the input and language feedback. We use a Refinement LM (e.g., an instruction-finetuned LM) to generate the refinements (one could however use the same LM that generated the initial output).
Select the refinement that best incorporates the feedback, using a language reward model such as an instruction-finetuned LM, which we call InstructRM (Scheurer et al. 2023), or using unit tests (Chen et al. 2023).
Finetune the initial LM on the selected refinements given the input.These steps can be applied iteratively by using the finetuned model to generate initial outputs in the next iteration and collect more feedback on its outputs etc. Using this refine-and-finetune approach; we are finetuning an LM using language feedback in a supervised manner. A single iteration of ILF is also used as a first step in the Constitutional AI method (Bai et. al 2022). In the below figures, we show the full ILF algorithm on the task of summarization (top) and code generation (bottom).
Why Language Feedback?
Language Feedback is a Natural Abstraction for Humans
Language Models (LMs) are powerful tools that are trained on large datasets of text from the internet. However, it is difficult to specify the intended behavior of an LM, particularly in difficult tasks where the behavior can't be adequately demonstrated or defined, which can result in catastrophic outcomes caused by goal misspecification (Langosco et al. 2021, Shah et. al 2022). To address this issue, we propose using language feedback as a way to outer-align LMs with human preferences and introduce a novel algorithm called Imitation learning from language Feedback. Compared to binary comparisons used in Reinforcement Learning with Human Feedback (RLHF), language feedback is a more natural and information-rich form of human feedback that conveys more bits of information, enabling a more nuanced and comprehensive understanding of human preferences. Additionally, expressing feedback in language provides natural abstractions that align well with human ontology.
The use of language as a transmission protocol and file format has been optimized over thousands of years to facilitate human cooperati...
...more
20min
March 30, 2023 AF - AI and Evolution by Dan H
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI and Evolution, published by Dan H on March 30, 2023 on The AI Alignment Forum.
Executive Summary
Artificial intelligence is advancing quickly. In some ways, AI development is an uncharted frontier, but in others, it follows the familiar pattern of other competitive processes; these include biological evolution, cultural change, and competition between businesses. In each of these, there is significant variation between individuals structures and some are copied more than others, with the result that the future population is more similar to the most copied individuals of the earlier generation. In this way, species evolve, cultural ideas are transmitted across generations, and successful businesses are imitated while unsuccessful ones disappear.
This paper argues that these same selection patterns will shape AI development and that the features that will be copied the most are likely to create an AI population that is dangerous to humans. As AIs become faster and more reliable than people at more and more tasks, businesses that allow AIs to perform more of their work will outperform competitors still using human labor at any stage, just as a modern clothing company that insisted on using only manual looms would be easily outcompeted by those that use industrial looms. Companies will need to increase their reliance on AIs to stay competitive, and the companies that use AIs best will dominate the marketplace. This trend means that the AIs most likely to be copied will be very efficient at achieving their goals autonomously with little human intervention.
A world dominated by increasingly powerful, independent, and goal-oriented AIs is dangerous. Today, the most successful AI models are not transparent, and even their creators do not fully know how they work or what they will be able to do before they do it. We know only their results, not how they arrived at them. As people give AIs the ability to act in the real world, the AIs’ internal processes will still be inscrutable: we will be able to measure their performance only based on whether or not they are achieving their goals. This means that the AIs humans will see as most successful — and therefore the ones that are copied — will be whichever AIs are most effective at achieving their goals, even if they use harmful or illegal methods, as long as we do not detect their bad behavior.
In natural selection, the same pattern emerges: individuals are cooperative or even altruistic in some situations, but ultimately, strategically selfish individuals are best able to propagate. A business that knows how to steal trade secrets or deceive regulators without getting caught will have an edge over one that refuses to ever engage in fraud on principle. During a harsh winter, an animal that steals food from others to feed its own children will likely have more surviving offspring. Similarly, the AIs that succeed most will be those able to deceive humans, seek power, and achieve their goals by any means necessary.
If AI systems are more capable than we are in many domains and tend to work toward their goals even if it means violating our wishes, will we be able to stop them? As we become increasingly dependent on AIs, we may not be able to stop AI’s evolution. Humanity has never before faced a threat that is as intelligent as we are or that has goals. Unless we take thoughtful care, we could find ourselves in the position faced by wild animals today: most humans have no particular desire to harm gorillas, but the process of harnessing our intelligence toward our own goals means that they are at risk of extinction, because their needs conflict with human goals.
This paper proposes several steps we can take to combat selection pressure and avoid that outcome. We are optimistic that if we are careful and prudent, we can ensure that...
...more
5min
March 30, 2023 EA - Vote for GWWC to present at SXSW Sydney! by Giving What We Can
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Vote for GWWC to present at SXSW Sydney!, published by Giving What We Can on March 29, 2023 on The Effective Altruism Forum.
All it takes is a single click to help vote to get GWWC onstage and presenting about our work at SXSW Sydney this year.
Your vote would be super helpful and takes less than a minute.
You can read the session proposal at the link below, too!
Vote here
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
1min
March 30, 2023 LW - You Can’t Predict a Game of Pinball by AI Impacts
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You Can’t Predict a Game of Pinball, published by AI Impacts on March 30, 2023 on LessWrong.
Jeffrey Heninger, 29 March 2023
Introduction
When thinking about a new idea, it helps to have a particular example to use to gain intuition and to clarify your thoughts. Games are particularly helpful for this, because they have well defined rules and goals. Many of the most impressive abilities of current AI systems can be found in games.1
To demonstrate how chaos theory imposes some limits on the skill of an arbitrary intelligence, I will also look at a game: pinball.
In this page, I will show that the uncertainty in the location of the pinball grows by a factor of about 5 every time the ball collides with one of the disks. After 12 bounces, an initial uncertainty in position the size of an atom grows to be as large as the disks themselves. Since you cannot launch a pinball with more than atom-scale precision, or even measure its position that precisely, you cannot make the ball bounce between the disks for more than 12 bounces.
The challenge is not that we have not figured out the rules that determine the ball’s motion. The rules are simple; the ball’s trajectory is determined by simple geometry. The challenge is that the chaotic motion of the ball amplifies microscopic uncertainties. This is not a problem that is solvable by applying more cognitive effort.
The Game
Let’s consider a physicist’s game of pinball.
Forget about most of the board and focus on the three disks at the top. Each disk is a perfect circle of radius R. The disks are arranged in an equilateral triangle. The minimum distance between any two disks is L. See Figure 1 for a picture of this setup.
Figure 1: An idealization of the three disks near the top of a pinball table. Drawn by Jeffrey Heninger.
The board is frictionless and flat, not sloped like in a real pinball machine. Collisions between the pinball and the disks are perfectly elastic, with no pop bumpers that come out of the disk and hit the ball. The pinball moves at a constant speed all of the time and only changes direction when it collides with a disk.
The goal of the game is to get the pinball to bounce between the disks for as long as possible. As long as it is between the disks, it will not be able to get past your flippers and leave the board.
A real game of pinball is more complicated than this – and correspondingly, harder to predict. If we can establish that the physicist’s game of pinball is impossible to predict, then a real game of pinball will be impossible to predict too.
Collisions
When the ball approaches a disk, it will not typically be aimed directly at the center of the disk. How far off center it is can be described by the impact parameter, b, which is the distance between the trajectory of the ball and a parallel line which passes through the center of the disk. Figure 2 shows the trajectory of the ball as it collides with a disk.
The surface of the disk is at an angle relative to the ball’s trajectory. Call this angle θ. This is also the angle of the position of the collision on the disk relative to the line through the center parallel to the ball’s initial trajectory. This can be seen in Figure 2 because they are corresponding angles on a transversal.
Figure 2: A single bounce of the pinball off of one of the disks. Drawn by Jeffrey Heninger.
At the collision, the angle of incidence equals the angle of reflection. The total change in the direction of the ball’s motion is 2θ.
We cannot aim the ball with perfect precision. We can calculate the effect of this imperfect precision by following two slightly different trajectories through the collision instead of one.
The two trajectories have slightly different initial locations. The second trajectory has impact parameter b + db, with db ≪ b. Call db the uncertainty in the impac...
...more
13min
March 30, 2023 LW - Actually, Othello-GPT Has A Linear Emergent World Representation by Neel Nanda
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Actually, Othello-GPT Has A Linear Emergent World Representation, published by Neel Nanda on March 29, 2023 on LessWrong.
Epistemic Status: This is a write-up of an experiment in speedrunning research, and the core results represent ~20 hours/2.5 days of work (though the write-up took way longer). I'm confident in the main results to the level of "hot damn, check out this graph", but likely have errors in some of the finer details.
Disclaimer: This is a write-up of a personal project, and does not represent the opinions or work of my employer
This post may get heavy on jargon. I recommend looking up unfamiliar terms in my mechanistic interpretability explainer
Thanks to Chris Olah, Martin Wattenberg, David Bau and Kenneth Li for valuable comments and advice on this work, and especially to Kenneth for open sourcing the model weights, dataset and codebase, without which this project wouldn't have been possible! Thanks to ChatGPT for formatting help.
Overview
Context: A recent paper trained a model to play legal moves in Othello by predicting the next move, and found that it had spontaneously learned to compute the full board state - an emergent world representation.
This could be recovered by non-linear probes but not linear probes.
We can causally intervene on this representation to predictably change model outputs, so it's telling us something real
I find that actually, there's a linear representation of the board state!
But that rather than "this cell is black", it represents "this cell has my colour", since the model plays both black and white moves.
We can causally intervene with the linear probe, and the model makes legal moves in the new board!
This is evidence for the linear representation hypothesis: that models, in general, compute features and represent them linearly, as directions in space! (If they don't, mechanistic interpretability would be way harder)
The original paper seemed at first like significant evidence for a non-linear representation - the finding of a linear representation hiding underneath shows the real predictive power of this hypothesis!
This (slightly) strengthens the paper's evidence that "predict the next token" transformer models are capable of learning a model of the world.
Part 2: There's a lot of fascinating questions left to answer about Othello-GPT - I outline some key directions, and how they fit into my bigger picture of mech interp progress
Studying modular circuits: A world model implies emergent modularity - many early circuits together compute a single world model, many late circuits each use it. What can we learn about what transformer modularity looks like, and how to reverse-engineer it?
Prior transformer circuits work focuses on end-to-end circuits, from the input tokens to output logits. But this seems unlikely to scale!
I present some preliminary evidence reading off a neuron's function from its input weights via the probe
Neuron interpretability and Studying Superposition: Prior work has made little progress on understanding MLP neurons. I think Othello GPT's neurons are tractable to understand, yet complex enough to teach us a lot!
I further think this can help us get some empirical data about the Toy Models of Superposition paper's predictions
I investigate max activating dataset examples and find seeming monosemanticity, yet deeper investigation show it seems more complex.
A transformer circuit laboratory: More broadly, the field has a tension between studying clean, tractable yet over-simplistic toy models and studying the real yet messy problem of interpreting LLMs - Othello-GPT is toy enough to be tractable yet complex enough to be full of mysteries, and I detail many more confusions and conjectures that it could shed light on.
Part 3: Reflections on the research process
I did the bulk of this project in a weeke...
...more
31min
March 30, 2023 EA - What are the arguments that support China building AGI+ if Western companies delay/pause AI development? by DMMF
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What are the arguments that support China building AGI+ if Western companies delay/pause AI development?, published by DMMF on March 29, 2023 on The Effective Altruism Forum.
In nearly every discussion I've engaged in relating to the potential delay or pause in AI research, multiple people have responded with the quip: "If we don't build AGI, then China will, which is an even worse possible world". This is taken at face value and is something I've never seen seriously challenged.
This does not seem obvious to me.
Given China's semi-conductor supply chain issues, China's historical lack of cutting edge innovative technology research and the tremendous challenges powerful AI systems may pose to the governing party and their ideology, it seems highly uncertain that China will develop AGI in a world where Western orgs stopped developing improved LLMs.
I appreciate people can point to multiple countries, including ones with non-impressive historical research credentials, developing nuclear weapons independently.
Beyond this, can anyone point me to, or outline arguments in favour of the idea that China is very likely to develop AGI+, even if Western orgs cease research in this field.
I don't have a strong view on this topic but given so many people assume it to be true, I would like to further understand the arguments in support of this claim.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
2min
March 30, 2023 EA - Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky by jacquesthibs
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky, published by jacquesthibs on March 29, 2023 on The Effective Altruism Forum.
New article in Time Ideas by Eliezer Yudkowsky.
Here’s some selected quotes.
In reference to the letter that just came out (discussion here):
We are not going to bridge that gap in six months.
It took more than 60 years between when the notion of Artificial Intelligence was first proposed and studied, and for us to reach today’s capabilities. Solving safety of superhuman intelligence—not perfect safety, safety in the sense of “not killing literally everyone”—could very reasonably take at least half that long. And the thing about trying this with superhuman intelligence is that if you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as in other challenges we’ve overcome in our history, because we are all gone.
Some of my friends have recently reported to me that when people outside the AI industry hear about extinction risk from Artificial General Intelligence for the first time, their reaction is “maybe we should not build AGI, then.”
Hearing this gave me a tiny flash of hope, because it’s a simpler, more sensible, and frankly saner reaction than I’ve been hearing over the last 20 years of trying to get anyone in the industry to take things seriously. Anyone talking that sanely deserves to hear how bad the situation actually is, and not be told that a six-month moratorium is going to fix it.
Here’s what would actually need to be done:
The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the U.S., then China needs to see that the U.S. is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the U.S. and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.
Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for anyone, including governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’...
...more
5min

FAQs about The Nonlinear Library:

How many episodes does The Nonlinear Library have?

The podcast currently has 9,862 episodes available.