Skip to content

2.6 Systemic Risk

Reading Time: 19 minutes

This section is still being written and is considered a work in progress.

In the previous sections we have talked about misalignment and misuse risks. Both of these sections looked at how harms can be caused by AI systems in isolation. However, the real world is often quite complicated and harms and risks cannot be predicted by studying a particular technology in isolation. Risks and safety, especially at the macro level, are properties of an interconnected system, not of individual technologies.

Even assuming that we have an aligned AI, it is possible that many relatively minor events could accumulate and lead us to slowly drift towards undesirable futures.

The causes contributing to systemic risk comprise many variables. There are some key factors that make risks systemic:

  • Interconnectedness: AI can empower several different existing systems such as supply chains, politics, or research in other disciplines. An individual system might cause substantial risk, but the interdependent behaviors arising from several empowered systems make it difficult to predict system-wide outcomes.

  • Emergence: New properties can emerge that are not present in individual components, thereby complicating predictions.

In this section we dive deeper into the risks that arise due to the interconnectedness of AI with various other sociotechnical systems. We will also explore the concept of emergence as a key factor contributing to unanticipated risks from AI systems.

2.6.1 Emergence

Emergent behavior, or emergence, manifests when a system exhibits properties or behaviors that its individual components lack independently. These attributes may materialize only when the components comprising the system interact as an integrated whole, or when the quantity of parts crosses a particular threshold. Often, these characteristics appear “all at once”—beyond the threshold, the system’s behavior undergoes a qualitative transformation. (source)

In “More Is Different for AI” Jacob Steinhardt provides additional examples of such complex systems. He suggests that AI systems will manifest such emergent properties simply as a function of scale. (source) Assuming that models persist in growing as per the scaling laws, an unexpected threshold may soon be crossed, resulting in unanticipated differences in behaviors.

Studying complex systems with emergent phenomena may assist in predicting what capabilities will emerge and when. Many, if not most, capabilities are the result of emergence in the current paradigm of ML. As an example, large language models have demonstrated surprising jumps in abilities such as improved performance on various tasks like modular arithmetic and answering questions in different languages once they reach a certain threshold size.

Similarly, future models have the potential to show emergent behavior that could be qualitatively distinct from what is expected or what we have safety mechanisms in place for.

Phase Transitions. In physics, a “phase transition” refers to a significant change in the structure within the system that can manifest as a discontinuity in the energy. For example, a phase change occurs in water when it freezes to turn into ice, a solid, or evaporates to turn into vapor, a gas. Both changes occur at a critical temperature particular to water’s chemical composition. In ML, phase transitions can be thought of as sudden shifts between different configurations of the network which can dramatically change the network's behavior and potentially lead to unpredictable or uncontrollable outcomes.

This concept is especially relevant when considering the “sharp left turn” hypothesis, where an AI might suddenly generalize its capabilities to new domains without a corresponding increase in alignment.

2.6.2 Persuasion

Polluting the information ecosystem. The deliberate propagation of disinformation is already a serious issue reducing our shared understanding of reality and polarizing opinions. AIs could be used to severely exacerbate this problem by generating personalized disinformation on a larger scale than ever before. Additionally, as AIs become better at predicting and nudging our behavior, they will become more capable of manipulating us. We will now discuss how AIs could be leveraged by malicious actors to create a fractured and dysfunctional society.

First, AIs could be used to generate unique personalized disinformation at a large scale. While there are already many social media bots, some of which exist to spread disinformation, historically they have been run by humans or primitive text generators. The latest AI systems do not need humans to generate personalized messages, never get tired, and can potentially interact with millions of users at once (source).

As things like deep fakes become ever more practical (e.g., with fake kidnapping scams) (source). AI-powered tools could be used to generate and disseminate false or misleading information at scale, potentially influencing elections or undermining public trust in institutions.

AIs can exploit users’ trust. Already, hundreds of thousands of people pay for chatbots marketed as lovers and friends (source), and one man’s suicide has been partially attributed to interactions with a chatbot (source). As AIs appear increasingly human-like, people will increasingly form relationships with them and grow to trust them. AIs that gather personal information through relationship-building or by accessing extensive personal data, such as a user’s email account or personal files, could leverage that information to enhance persuasion. Powerful actors that control those systems could exploit user trust by delivering personalized disinformation directly through people’s “friends.”

Value lock-in. If AIs become too deeply embedded into society and are highly persuasive, we might see a scenario where a system's current values, principles, or procedures become so deeply entrenched that they are resistant to change. This could be due to a variety of reasons such as technological constraints, economic costs, or social and institutional inertia. The danger with value lock-in is the potential for perpetuating harmful or outdated values, especially when these values are institutionalized in influential systems like AI.

Locking in certain values may curtail humanity’s moral progress. It’s dangerous to allow any set of values to become permanently entrenched in society. For example, AI systems have learned racist and sexist views (source), and once those views are learned, it can be difficult to fully remove them. In addition to problems we know exist in our society, there may be some we still do not. Just as we abhor some moral views widely held in the past, people in the future may want to move past moral views that we hold today, even those we currently see no problem with. For example, moral defects in AI systems would be even worse if AI systems had been trained in the 1960s, and many people at the time would have seen no problem with that. Therefore, when advanced AIs emerge and transform the world, there is a risk of their objectives locking in or perpetuating defects in today’s values. If AIs are not designed to continuously learn and update their understanding of societal values, they may perpetuate or reinforce existing defects in their decision-making processes long into the future.

In a world with widespread persuasive AI systems, people’s beliefs might be almost entirely determined by which AI systems they interact with most. Never knowing whom to trust, people could retreat even further into ideological enclaves, fearing that any information from outside those enclaves might be a sophisticated lie. This would erode consensus reality, people’s ability to cooperate with others, participate in civil society, and address collective action problems. This would also reduce our ability to have a conversation as a species about how to mitigate existential risks from AIs.

In summary, AIs could create highly effective, personalized disinformation on an unprecedented scale, and could be particularly persuasive to people they have built personal relationships with. In the hands of many people, this could create a deluge of disinformation that debilitates human society.

2.6.3 Power Concentration

In a previous section, we already spoke about value lock-in. This phenomenon of entrenched values can happen in a “bottom-up” fashion when society's moral character becomes fixed, but a similar risk also arises in a “top-down” case of misuse when corporations or governments might pursue intense surveillance and seek to keep AIs in the hands of a trusted minority. This reaction to keep AI “safe” could easily become an overcorrection and pave the way for an entrenched totalitarian regime that would be locked in by the power and capacity of AIs.

Value lock-in can occur from the perpetuation of systems and practices that undermine individual autonomy and freedom, such as the implementation of paternalistic systems where certain value judgments are imposed on individuals without their consent. Even without active malicious use, values encoded in an AI system could create a self-reinforcing feedback loop where groups get stuck in a poor equilibrium that is robust to attempts to get unstuck. (source)

AI safety could further centralize control. This could begin with good intentions, such as using AIs to enhance fact-checking and help people avoid falling prey to false narratives. We could see regulations that consolidate control over various components needed to build TAI into the hands of a few state or corporate actors, to ensure that any AI that is built remains safe. This includes things such as data centers, computing power, and big data. However, those in control of powerful systems may use them to suppress dissent, spread propaganda and disinformation, and otherwise advance their goals, which may be contrary to public well-being. (source)

Loss of privacy. The loss of individual privacy is among the factors that might accelerate power concentration. Better persuasion and predictive models of human behavior benefit from gathering more data about individual users. The desire for profit or to predict the flow of a country's resources, demographics, culture, etc. might incentivize behavior like intercepting personal data or legally eavesdropping on people’s activities. Data Mining can be used to collect and analyze large amounts of data from various sources such as social media, purchases, and internet usage. This information can be pieced together to create a complete picture of an individual's behavior, preferences, and lifestyle (source). Voice Recognition technologies can be used to recognize speech, which could potentially lead to widespread wiretapping. For example, a system like the U.S. government's Echelon system uses language translation, speech recognition, and keyword searching to automatically sift through telephone, email, fax, and telex traffic (source). AI can also be used to identify individuals in public spaces using facial recognition. This capability can potentially invade a person's privacy if a random stranger can easily identify them in public places (source). Some AIs can even decipher passwords from keyboard sounds in some contexts. (source)

Whenever AI systems are used to collect and analyze data on a mass scale regimes can further strengthen self-reinforcing control. Personal information can be used to unfairly or unethically influence people's behavior. This can occur from both a state and a corporate perspective.

Economic inequalities. tbd. \

2.6.4 Biases

Exacerbated biases: AIs might unintentionally propagate or amplify existing biases. Biases persist within Large Language Models that often mirror the opinions and biases prevalent on the the internet data from which they were trained (source) These biases can be harmful in various ways, as demonstrated by studies on GPT-3's Islamophobic biases. (source) The paper Evaluating the Social Impact of Generative AI Systems in Systems and Society defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. (source)

Self-Fulfilling Pessimism (source)

Scientists develop an algorithm for predicting the answers to questions about a person, as a function of freely available and purchasable information about the person (social media, resumes, browsing history, purchasing history, etc.). The algorithm is made freely available to the public, and employers begin using the algorithm to screen out potential hires by asking, “Is this person likely to be arrested in the next year?” Courts and regulatory bodies attempt to ban the technology by evoking privacy norms but struggle to establish cases against the use of publicly available information.As a result, the technology broadly remains in use. Innocent people who share certain characteristics with past convicted criminals end up struggling to get jobs, become disproportionately unemployed, and correspondingly more often commit theft to fulfill basic needs. Meanwhile, police also use the algorithm to prioritize their investigations, and since unemployment is a predictor of property crime, the algorithm leads them to suspect and arrest more unemployed people. Some of the arrests are talked about on social media, so the algorithm learns that the arrested individuals are likely to be arrested again, making it even more difficult for them to get jobs. A cycle of deeply unfair socioeconomic discrimination begins.

2.6.5 Automation

Economic Upheaval. The automation of the economy could lead to widespread impacts on the labor market, potentially exacerbating economic inequalities and social divisions (source). This shift towards mass unemployment could also contribute to mental health issues by making human labor increasingly redundant. (source)

Disempowerment & Enfeeblement. AI systems could make individual choices and agency less relevant as decisions are increasingly made or influenced by automated processes. This occurs when humans delegate increasingly important tasks to machines, leading to a loss of self-governance and complete dependence on machines. This scenario is reminiscent of the film Wall-E in which humans become dependent on machines. (source)

Story:The production web

The economic incentives to automate are strong and may lead to certain risks. A system with a human in the loop is slower than a fully automated system.

The production web. A consequence of AI that could create risks at a societal scale is described in the paper “TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI,” in the form of a short story: 'Story 1b: The Production Web,' which depicts a kind of capitalism on steroids, which gradually depletes all the natural resources necessary for human survival.

Here is the outline of this story: In a world where the economy is increasingly automated by AI systems that are much faster than humans, there arises a competitive pressure such that only the fastest companies survive. In this context, businesses with humans in the loop would be less efficient compared to those fully automated. Consequently, we would gradually see a world where humans are replaced and cede control to machines because their quality of life improves by doing so. And progressively, control is progressively handed over to more competitive machines. However, the economic system designed by these machines does not fully account for negative externalities. It maximizes metrics that are mere proxies for the actual well-being of humans. As a result, we get a system that rapidly consumes vast amounts of raw materials essential for human survival, such as air, rare metals, and oxygen, because machines do not need the same types of resources as humans. This could gradually lead us to a world uninhabitable by humans. It would no longer be possible to disconnect this system because humans would become dependent on it, just as today it is not possible to disconnect the Internet because the entire logistics and supply chain depends on it.

Note that the previous story does not require AI agents. This is a Robust Agent-Agnostic Process (RAAPs), meaning that this story can occur with or without agentic AIs. Nonetheless, the authors of this chapter think that an AI Agent could make this story more plausible. In the article “Why Tool AIs Want to Be Agent AIs,” the author explains: “AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn because all problems are reinforcement-learning problems. […] All of these actions will result in Agent AIs being more intelligent than Tool AIs, in addition to their greater economic competitiveness. […]”.

2.6.6 Epistemic Erosion

Epistemic Deterioration. This can result from enfeeblement or the use of persuasion tools, leading to a massive deterioration of collective epistemic capacity (source) (our ability to reason and understand the world). The ability to comprehend and respond to problems are crucial skills that make our civilization robust to various threats. Without these, we could be incapable of making correct decisions, possibly leading to disastrous outcomes.

Epistemic Security. Arguably social media has undermined the ability of political communities to work together, making them more polarized and untethered from a foundation of agreed facts. Hostile foreign states have sought to exploit the vulnerability of mass political deliberation in democracies. While not yet possible, the specter of mass manipulation through psychological profiling as advertised by Cambridge Analytica hovers on the horizon. A decline in the ability of the world’s advanced democracies to deliberate competently would lower the chances that these countries could competently shape the development of advanced AI. (source)

2.6.7 Value Erosion

Fragility of Complex Systems. The automation and tight coupling of different system components can make the failure of one part trigger the collapse of the entire system. (source) One possible example could be financial markets or automated trading systems, where complex dynamics can emerge, leading to unintended and potentially misaligned outcomes at the systemic level. Another example could be flash wars, as illustrated in the box below.

Challenges in Multi-Agent Systems. In environments containing multiple agents, research highlights the risk of collective misalignment, where the pursuit of individual goals by agents leads to adverse effects on the system as a whole. This is exemplified in scenarios like Paul Cristiano's “You get what you measure,” which warns of an overemphasis on simple metrics such as the GDP economic metric that fail to consider the broader implications for human values. This could result in a civilization increasingly managed by seemingly beneficial tools that, in reality, erode human-centric values. Another problem would be the competitive disadvantage of human values with respect to other values. Evolutionary dynamics might favor aggressive behaviors, posing significant risks if AIs begin to outcompete humans, as discussed in “Natural Selection Favors AIs over Humans” by Dan Hendrycks. (source)

2.6.8 Accidents

Often, the whole point of producing a new technology is to produce a positive impact on society. Despite these noble intentions, there is a major category of risk that arises from large well-intentioned projects that unintentionally go wrong. (source)

Flaws are hard to discover. It often takes time to observe all the downstream effects of releasing a technology. There are many examples throughout history of technologies that we built and released into the world only to later discover that they were causing harm. Some historical examples include the use of leaded paints and gasoline causing large populations to suffer from lead poisoning (source), our use of CFCs causing a hole in the ozone layer (source), our use of asbestos which is linked to serious health issues, our use of tobacco products (source), and more recently the widespread use of social media, the excessive use of which is linked to depression and anxiety. (source)

Some of these risks are diffuse and emerge only at the societal level, which we will talk about in the section on systemic risks. But others are perhaps easier to compare to software-based AI risks:

Undetected hole in the ozone layer. The example of the hole in the ozone layer might have occurred due to diffuse responsibility, but it was made worse because it remained undetected for a long period (source). This is because the data analysis software used by NASA in its project to map the ozone layer had been designed to ignore values that deviated greatly from expected measurements. (source)

The Mariner 1 Spacecraft. In 1962 the Mariner 1 space probe barely made it out of Cape Canaveral before the rocket veered dangerously off course. Worried that the rocket was heading towards a crash-landing on Earth, NASA engineers issued a self-destruct command and the craft was obliterated about 290 seconds after launch. An investigation revealed the cause to be a very simple software error. A hyphen was omitted in a line of code, which meant that incorrect guidance signals were sent to the spacecraft. (source)

There are countless other similar examples. Just like the one missing hyphen in the software for the Mariner spacecraft, we have also seen similar bugs due to one single character being altered in AI systems. OpenAI accidentally inverted the sign on the reward function while training GPT-2. The result was a model which optimized for negative sentiment while still regularizing toward natural language. Over time this caused the model to generate increasingly sexually explicit text, regardless of the starting prompt. In the author's own words “This bug was remarkable since the result was not gibberish but maximally bad output. The authors were asleep during the training process, so the problem was noticed only once training had finished.” (source)

While this example didn't really cause much harm, except to perhaps the human evaluators who had to spend an entire night reading increasingly reprehensible text, we can easily imagine that extremely small bugs like a single flipped sign on a reward function can cause really bad outcomes if they were to occur in more capable models.

The rapid improvement, combined with a lack of understanding and predictability makes it more likely that despite the best intentions we might not be able to prevent accidents. This supports the case for heavily tested slow rollouts of AI systems, as opposed to the “Move fast and break things” ethos (source) that some tech companies might hold.

Harmful malfunctions (source). AI systems can make mistakes if applied inappropriately. For example:

  • A self-driving car in San Francisco collided with a pedestrian that was thrown into its path by a human driver. This was arguably not its fault - however, after initially stopping it then started moving again, dragging the injured pedestrian a further six meters along the road. (source) Government investigators alleged that the company initially hid the severity of the collision from them. (source)

  • A healthcare chatbot deployed in the UK was heavily criticized when it advised users potentially experiencing a heart attack not to get treatment. When these concerns were raised by a doctor, the company released a statement calling them a "Twitter troll". (source)

Furthermore, use of AI systems can make it harder to detect and address process issues. Outputs of computer systems are likely to be overly trusted. (source) Additionally, because most AI models are used as black boxes and AI systems are much more likely to be protected from court scrutiny than human processes, it can be hard to prove mistakes. (source)

Counterarguments to Risk [WIP]

Current short term AI Risks [WIP]