Skip to content

3.2 Definitions

AI alignment is hard to define but generally refers to the process and goal of ensuring that an AI system's behaviors and decision-making processes are in harmony with the intentions, values, and goals of its human operators or designers. The main concern is whether the AI is pursuing the objectives we want it to pursue. AI alignment does not encompass systemic risks and misuses.

AI safety, on the other hand, encompasses a broader set of concerns about how AI systems might pose risks or be dangerous and what can be done to prevent or mitigate those risks. Safety is concerned with ensuring that AI systems do not inadvertently or deliberately cause harm or danger to humans or the environment.

AI ethics is the field that examines the moral principles and societal implications of AI systems, focusing on how these technologies should align with and respect fundamental human values. It addresses the ethical considerations of AI rights, the potential societal upheavals resulting from AI advancements, and the moral frameworks necessary to navigate these changes. The core of AI ethics lies in ensuring that AI developments are aligned with human dignity, fairness, and societal well-being, through a deep understanding of their broader societal impact. This chapter is mostly not focusing on AI ethics.

AI control is about the mechanisms that can be put in place to manage and restrain an AI system if it acts contrary to human interests. Control might require physical or virtual constraints, monitoring systems, or even a kill switch to shut down the AI if necessary (source).

In essence, alignment seeks to prevent preference divergence by design, while control deals with the consequences of potential divergence after the fact (source). The distinction is important because some approaches to the AI safety problem focus on building aligned AI that inherently does the right thing, while others focus on finding ways to control AI that might have other goals. For example, a research agenda from the research organization Redwood Research argues that even if alignment is necessary for superintelligence-level AIs, control through some form of monitoring may be a working strategy for AIs in the human regime (source).

Ideally, an AGI would be aligned and controllable, meaning it would have the right goals and be subject to human oversight and intervention if something goes wrong.

In the remainder of this chapter, we will reuse the classification of risks we used in the last chapter: misuse, alignment, and systemic risks.