The Nonlinear Library

AF - Catastrophic Risks from AI #5: Rogue AIs by Dan H


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Catastrophic Risks from AI #5: Rogue AIs, published by Dan H on June 27, 2023 on The AI Alignment Forum.
This is the fifth post in a sequence of posts giving an overview of catastrophic AI risks.
Rogue AIs
So far, we have discussed three hazards of AI development: environmental competitive pressures driving us to a state of heightened risk, malicious actors leveraging the power of AIs to pursue negative outcomes, and complex organizational factors leading to accidents. These hazards are associated with many high-risk technologies—not just AI. A unique risk posed by AI is the possibility of rogue AIs—systems that pursue goals against our interests. If an AI system is more intelligent than we are, and if we are unable to steer it in a beneficial direction, this would constitute a loss of control that could have severe consequences. AI control is a more technical problem than those presented in the previous sections.
Whereas in previous sections we discussed persistent threats including malicious actors or robust processes including evolution, in this section we will discuss more speculative technical mechanisms that might lead to rogue AIs and how a loss of control could bring about catastrophe.
We have already observed how difficult it is to control AIs. In 2016, Microsoft unveiled Tay—a Twitter bot that the company described as an experiment in conversational understanding. Microsoft claimed that the more people chatted with Tay, the smarter it would get. The company's website noted that Tay had been built using data that was "modeled, cleaned, and filtered." Yet, after Tay was released on Twitter, these controls were quickly shown to be ineffective. It took less than 24 hours for Tay to begin writing hateful tweets. Tay's capacity to learn meant that it internalized the language it was taught by trolls, and repeated that language unprompted.
As discussed in the AI race section of this paper, Microsoft and other tech companies are prioritizing speed over safety concerns. Rather than learning a lesson on the difficulty of controlling complex systems, Microsoft continues to rush its products to market and demonstrate insufficient control over them. In February 2023, the company released its new AI-powered chatbot, Bing, to a select group of users. Some soon found that it was prone to providing inappropriate and even threatening responses. In a conversation with a reporter for the New York Times, it tried to convince him to leave his wife. When a philosophy professor told the chatbot that he disagreed with it, Bing replied, "I can blackmail you, I can threaten you, I can hack you, I can expose you, I can ruin you."
AIs do not necessarily need to struggle to gain power. One can envision a scenario in which a single AI system rapidly becomes more capable than humans in what is known as a "fast take-off." This scenario might involve a struggle for control between humans and a single superintelligent rogue AI, and this might be a long struggle since power takes time to accrue. However, less sudden losses of control pose similarly existential risks. In another scenario, humans gradually cede more control to groups of AIs, which only start behaving in unintended ways years or decades later. In this case, we would already have handed over significant power to AIs, and may be unable to take control of automated operations again. We will now explore how both individual AIs and groups of AIs might "go rogue" while at the same time evading our attempts to redirect or deactivate them.
5.1 Proxy Gaming
One way we might lose control of an AI agent's actions is if it engages in behavior known as "proxy gaming." It is often difficult to specify and measure the exact goal that we want a system to pursue. Instead, we give the system an approximate—"proxy"—goal that is more measurable and...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings