2.5 Systemic Risks¶
What are systemic risks? In the previous sections, we saw how individual AI systems might cause harm through misuse or misalignment. But some of the biggest risks, even existential ones, don't come from any single AI system misbehaving in isolation. Instead, they emerge gradually from how many AI systems affect society's core systems - like markets, democratic institutions, and social structures (Kulveit et al., 2025). Think about how tech companies already use AI recommendation systems to shape what content people see, which influences culture and politics, which in turn affects geopolitics and what regulations get passed about AI. Or how AI trading systems can trigger market volatility faster than humans can respond, potentially destabilizing not just financial markets but the broader economy that depends on them (Critch & Russell, 2023).
How do small changes cause big risks? Unlike misaligned AI systems directly causing harm, systemic risks can emerge even when each individual AI system appears to be working correctly. As an example, look at how the 2008 financial crisis arose not from any single bank's actions, but from the combined effects of many institutions and society at large making decisions that created patterns threatening the entire financial system. Something similar could happen with AI. Companies, countries, and individuals might make sensible choices about automation and AI adoption. The systems themselves might work exactly as designed. But collectively, these choices could reshape civilization in ways that no one intended (Uuk et al., 2024). AIs might individually make optimal recommendations while collectively contributing to dangerous power concentration, or they might each provide personally appealing services while collectively eroding human autonomy and social cohesion. If AIs gradually become integral to how society works or how decisions get made we might lose the ability to meaningfully shape our collective future. This loss of influence could be irreversible, effectively ending humanity's ability to choose the direction of civilization. (Critch & Russell, 2023; Kulveit et al., 2025; Hendrycks, 2023).
Why are systemic risks self reinforcing? As AI systems become more integrated into core societal functions, they create new pressures for further adoption. Companies that don't automate risk becoming uncompetitive. Individuals who don't use AI assistance might fall behind. States that don't adopt AI capabilities could lose geopolitical influence. Each shift makes reversing course more difficult while accelerating the next round of changes (Kulveit et al., 2025). When AI systems interact with each other, they can create patterns too fast or complex for humans to track. When humans interact with AI systems, it gradually transforms how society works - from how we make decisions to how we connect with each other (Friston et al., 2022).
How do systemic risks amplify each other? Changes in one part of society don't happen in isolation - they trigger changes elsewhere. Economic power shapes political decisions. Political choices affect culture. Cultural shifts influence economic behavior. These feedback loops mean that small changes in one area can cascade into major transformations across society (Kulveit et al., 2025). If AI systems become too integral to how markets work, how decisions get made, and how humans connect with each other, we might permanently lose the ability to meaningfully shape our collective future (Hendrycks, 2023).
Why are systemic risks hard to identify and fix? Systemic risks are particularly challenging to address. The changes happen gradually, with no clear line between helpful AI influence and dangerous dependence. By the time we notice problems, they may be deeply embedded in how society works. When many systems interact, it becomes difficult to identify which specific parts are creating problems or where to focus solutions (Uuk et al., 2024). Most importantly, once society depends on AI systems, changing course becomes extremely difficult - imagine trying to run a modern economy without computers (Kulveit et al., 2025).
In the next sections, we'll look at how these dynamics could play out in economic systems, democratic systems, infrastructure and social systems. Even though we'll examine each separately to understand the core problems, keep in mind that just like everything else explained in this chapter, all the risks interact and reinforce each other. This means harmful changes could happen faster than expected, or become impossible to reverse once they take hold.
2.5.1 Emergence¶
Emergent behavior, or emergence, manifests when a system exhibits properties or behaviors that its individual components lack independently. These attributes may materialize only when the components comprising the system interact as an integrated whole, or when the quantity of parts crosses a particular threshold. Often, these characteristics appear “all at once”—beyond the threshold, the system’s behavior undergoes a qualitative transformation. (Wikipedia)
In “More Is Different for AI” Jacob Steinhardt provides additional examples of such complex systems. He suggests that AI systems will manifest such emergent properties simply as a function of scale. (Steinhardt, 2022) Assuming that models persist in growing as per the scaling laws, an unexpected threshold may soon be crossed, resulting in unanticipated differences in behaviors.
Studying complex systems with emergent phenomena may assist in predicting what capabilities will emerge and when. Many, if not most, capabilities are the result of emergence in the current paradigm of ML. As an example, large language models have demonstrated surprising jumps in abilities such as improved performance on various tasks like modular arithmetic and answering questions in different languages once they reach a certain threshold size.
Similarly, future models have the potential to show emergent behavior that could be qualitatively distinct from what is expected or what we have safety mechanisms in place for.
Phase Transitions . In physics, a “phase transition” refers to a significant change in the structure within the system that can manifest as a discontinuity in the energy. For example, a phase change occurs in water when it freezes to turn into ice, a solid, or evaporates to turn into vapor, a gas. Both changes occur at a critical temperature particular to water’s chemical composition. In ML, phase transitions can be thought of as sudden shifts between different configurations of the network which can dramatically change the network's behavior and potentially lead to unpredictable or uncontrollable outcomes.
This concept is especially relevant when considering the “sharp left turn” hypothesis, where an AI might suddenly generalize its capabilities to new domains without a corresponding increase in alignment.
2.5.2 Persuasion¶
Polluting the information ecosystem . The deliberate propagation of disinformation is already a serious issue reducing our shared understanding of reality and polarizing opinions. AIs could be used to severely exacerbate this problem by generating personalized disinformation on a larger scale than ever before. Additionally, as AIs become better at predicting and nudging our behavior, they will become more capable of manipulating us. We will now discuss how AIs could be leveraged by malicious actors to create a fractured and dysfunctional society.
First, AIs could be used to generate unique personalized disinformation at a large scale. While there are already many social media bots, some of which exist to spread disinformation, historically they have been run by humans or primitive text generators. The latest AI systems do not need humans to generate personalized messages, never get tired, and can potentially interact with millions of users at once (Hendrycks, 2024).
As things like deep fakes become ever more practical (e.g., with fake kidnapping scams) (Karimi, 2023). AI-powered tools could be used to generate and disseminate false or misleading information at scale, potentially influencing elections or undermining public trust in institutions.
AIs can exploit users’ trust . Already, hundreds of thousands of people pay for chatbots marketed as lovers and friends (Tong, 2023), and one man’s suicide has been partially attributed to interactions with a chatbot (Xiang, 2023). As AIs appear increasingly human-like, people will increasingly form relationships with them and grow to trust them. AIs that gather personal information through relationship-building or by accessing extensive personal data, such as a user’s email account or personal files, could leverage that information to enhance persuasion. Powerful actors that control those systems could exploit user trust by delivering personalized disinformation directly through people’s “friends.”
2.5.3 Value lock-in¶
If AIs become too deeply embedded into society and are highly persuasive, we might see a scenario where a system's current values, principles, or procedures become so deeply entrenched that they are resistant to change. This could be due to a variety of reasons such as technological constraints, economic costs, or social and institutional inertia. The danger with value lock-in is the potential for perpetuating harmful or outdated values, especially when these values are institutionalized in influential systems like AI.
Locking in certain values may curtail humanity’s moral progress. It’s dangerous to allow any set of values to become permanently entrenched in society. For example, AI systems have learned racist and sexist views (Hendrycks, 2024), and once those views are learned, it can be difficult to fully remove them. In addition to problems we know exist in our society, there may be some we still do not. Just as we abhor some moral views widely held in the past, people in the future may want to move past moral views that we hold today, even those we currently see no problem with. For example, moral defects in AI systems would be even worse if AI systems had been trained in the 1960s, and many people at the time would have seen no problem with that. Therefore, when advanced AIs emerge and transform the world, there is a risk of their objectives locking in or perpetuating defects in today’s values. If AIs are not designed to continuously learn and update their understanding of societal values, they may perpetuate or reinforce existing defects in their decision-making processes long into the future.
In a world with widespread persuasive AI systems, people’s beliefs might be almost entirely determined by which AI systems they interact with most. Never knowing whom to trust, people could retreat even further into ideological enclaves, fearing that any information from outside those enclaves might be a sophisticated lie. This would erode consensus reality, people’s ability to cooperate with others, participate in civil society, and address collective action problems. This would also reduce our ability to have a conversation as a species about how to mitigate existential risks from AIs.
In summary, AIs could create highly effective, personalized disinformation on an unprecedented scale, and could be particularly persuasive to people they have built personal relationships with. In the hands of many people, this could create a deluge of disinformation that debilitates human society.
2.5.4 Power Concentration¶
In a previous section, we already spoke about value lock-in. This phenomenon of entrenched values can happen in a “bottom-up” fashion when society's moral character becomes fixed, but a similar risk also arises in a “top-down” case of misuse when corporations or governments might pursue intense surveillance and seek to keep AIs in the hands of a trusted minority. This reaction to keep AI “safe” could easily become an overcorrection and pave the way for an entrenched totalitarian regime that would be locked in by the power and capacity of AIs.
Value lock-in can occur from the perpetuation of systems and practices that undermine individual autonomy and freedom, such as the implementation of paternalistic systems where certain value judgments are imposed on individuals without their consent. Even without active malicious use, values encoded in an AI system could create a self-reinforcing feedback loop where groups get stuck in a poor equilibrium that is robust to attempts to get unstuck. (Hendrycks et al., 2022)
AI safety could further centralize control . This could begin with good intentions, such as using AIs to enhance fact-checking and help people avoid falling prey to false narratives. We could see regulations that consolidate control over various components needed to build TAI into the hands of a few state or corporate actors, to ensure that any AI that is built remains safe. This includes things such as data centers, computing power, and big data. However, those in control of powerful systems may use them to suppress dissent, spread propaganda and disinformation, and otherwise advance their goals, which may be contrary to public well-being. (Hendrycks, 2024)
2.5.5 Privacy Loss¶
The loss of individual privacy is among the factors that might accelerate power concentration. Better persuasion and predictive models of human behavior benefit from gathering more data about individual users. The desire for profit or to predict the flow of a country's resources, demographics, culture, etc. might incentivize behavior like intercepting personal data or legally eavesdropping on people’s activities. Data Mining can be used to collect and analyze large amounts of data from various sources such as social media, purchases, and internet usage. This information can be pieced together to create a complete picture of an individual's behavior, preferences, and lifestyle (Russel, 2019). Voice Recognition technologies can be used to recognize speech, which could potentially lead to widespread wiretapping. For example, a system like the U.S. government's Echelon system uses language translation, speech recognition, and keyword searching to automatically sift through telephone, email, fax, and telex traffic (Russel & Norvig, 1994). AI can also be used to identify individuals in public spaces using facial recognition. This capability can potentially invade a person's privacy if a random stranger can easily identify them in public places.
Whenever AI systems are used to collect and analyze data on a mass scale regimes can further strengthen self-reinforcing control. Personal information can be used to unfairly or unethically influence people's behavior. This can occur from both a state and a corporate perspective.
2.5.6 Biases¶
Exacerbated biases : AIs might unintentionally propagate or amplify existing biases. Biases persist within Large Language Models that often mirror the opinions and biases prevalent on the internet data from which they were trained (Santurkar et al., 2023) These biases can be harmful in various ways, as demonstrated by studies on GPT-3's Islamophobic biases. (Abid et al., 2021) The paper Evaluating the Social Impact of Generative AI Systems in Systems and Society defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. (Solaiman et al, 2024)
2.5.7 Automation¶
Economic Upheaval . The automation of the economy could lead to widespread impacts on the labor market, potentially exacerbating economic inequalities and social divisions (Dai, 2019). This shift towards mass unemployment could also contribute to mental health issues by making human labor increasingly redundant. (Federspiel et al., 2023)
Disempowerment & Enfeeblement. AI systems could make individual choices and agency less relevant as decisions are increasingly made or influenced by automated processes. This occurs when humans delegate increasingly important tasks to machines, leading to a loss of self-governance and complete dependence on machines. This scenario is reminiscent of the film Wall-E in which humans become dependent on machines. (Hendrycks et al., 2023)
Story:The production web
The economic incentives to automate are strong and may lead to certain risks. A system with a human in the loop is slower than a fully automated system.
The production web. A consequence of AI that could create risks at a societal scale is described in the paper “TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI,” in the form of a short story: 'Story 1b: The Production Web,' which depicts a kind of capitalism on steroids, which gradually depletes all the natural resources necessary for human survival.
Here is the outline of this story: In a world where the economy is increasingly automated by AI systems that are much faster than humans, there arises a competitive pressure such that only the fastest companies survive. In this context, businesses with humans in the loop would be less efficient compared to those fully automated. Consequently, we would gradually see a world where humans are replaced and cede control to machines because their quality of life improves by doing so. And progressively, control is progressively handed over to more competitive machines. However, the economic system designed by these machines does not fully account for negative externalities. It maximizes metrics that are mere proxies for the actual well-being of humans. As a result, we get a system that rapidly consumes vast amounts of raw materials essential for human survival, such as air, rare metals, and oxygen, because machines do not need the same types of resources as humans. This could gradually lead us to a world uninhabitable by humans. It would no longer be possible to disconnect this system because humans would become dependent on it, just as today it is not possible to disconnect the Internet because the entire logistics and supply chain depends on it.
Note that the previous story does not require AI agents. This is a Robust Agent-Agnostic Process (RAAPs), meaning that this story can occur with or without agentic AIs. Nonetheless, the authors of this chapter think that an AI Agent could make this story more plausible. In the article “Why Tool AIs Want to Be Agent AIs,” the author explains: “AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn because all problems are reinforcement-learning problems. […] All of these actions will result in Agent AIs being more intelligent than Tool AIs, in addition to their greater economic competitiveness. […]”.
2.5.8 Epistemic Erosion¶
Epistemic Deterioration . This can result from enfeeblement or the use of persuasion tools, leading to a massive deterioration of collective epistemic capacity (Kokotajlo, 2020) (our ability to reason and understand the world). The ability to comprehend and respond to problems are crucial skills that make our civilization robust to various threats. Without these, we could be incapable of making correct decisions, possibly leading to disastrous outcomes.
Epistemic Security . Arguably social media has undermined the ability of political communities to work together, making them more polarized and untethered from a foundation of agreed facts. Hostile foreign states have sought to exploit the vulnerability of mass political deliberation in democracies. While not yet possible, the specter of mass manipulation through psychological profiling as advertised by Cambridge Analytica hovers on the horizon. A decline in the ability of the world’s advanced democracies to deliberate competently would lower the chances that these countries could competently shape the development of advanced AI. (Dafoe, 2020)
2.5.9 Value Erosion¶
Fragility of Complex Systems . The automation and tight coupling of different system components can make the failure of one part trigger the collapse of the entire system. (Christiano, 2019) One possible example could be financial markets or automated trading systems, where complex dynamics can emerge, leading to unintended and potentially misaligned outcomes at the systemic level. Another example could be flash wars.
Challenges in Multi-Agent Systems . In environments containing multiple agents, research highlights the risk of collective misalignment, where the pursuit of individual goals by agents leads to adverse effects on the system as a whole. This is exemplified in scenarios like Paul Cristiano's “You get what you measure,” which warns of an overemphasis on simple metrics such as the GDP economic metric that fail to consider the broader implications for human values. This could result in a civilization increasingly managed by seemingly beneficial tools that, in reality, erode human-centric values. Another problem would be the competitive disadvantage of human values with respect to other values. Evolutionary dynamics might favor aggressive behaviors, posing significant risks if AIs begin to outcompete humans, as discussed in “Natural Selection Favors AIs over Humans” by Dan Hendrycks. (Hendrycks, 2023)