Skip to content

3.2 Definitions

Reading Time: 2 minutes

AI alignment is hard to define but generally refers to the process and goal of ensuring that an AI system's behaviors and decision-making processes are in harmony with the intentions, values, and goals of its human operators or designers. The main concern is whether the AI is pursuing the objectives we want it to pursue. AI alignment does not encompass systemic risks and misuses.

AI safety, on the other hand, encompasses a broader set of concerns about how AI systems might pose risks or be dangerous and what can be done to prevent or mitigate those risks. Safety is concerned with ensuring that AI systems do not inadvertently or deliberately cause harm or danger to humans or the environment.

AI ethics is the field that studies and formulates the moral principles guiding the development and use of AI systems to ensure they respect fundamental human values. It addresses issues such as bias, transparency, accountability, privacy, and societal impact, with the goal of developing trustworthy, fair, and beneficial AI through multidisciplinary collaboration and appropriate governance frameworks. This chapter is mostly not focusing on AI ethics.

AI control is about the mechanisms that can be put in place to manage and restrain an AI system if it acts contrary to human interests. Control might require physical or virtual constraints, monitoring systems, or even a kill switch to shut down the AI if necessary (source).

In essence, alignment seeks to prevent preference divergence by design, while control deals with the consequences of potential divergence after the fact (source). The distinction is important because some approaches to the AI safety problem focus on building aligned AI that inherently does the right thing, while others focus on finding ways to control AI that might have other goals. For example, a research agenda from the research organization Redwood Research argues that even if alignment is necessary for superintelligence-level AIs, control through some form of monitoring may be a working strategy for AIs in the human regime (source).

Ideally, an AGI would be aligned and controllable, meaning it would have the right goals and be subject to human oversight and intervention if something goes wrong.

In the remainder of this chapter, we will reuse the classification of risks we used in the last chapter: misuse, alignment, and systemic risks.