What role does data play in AI risks? Data fundamentally shapes what AI systems can do and how they behave. For frontier foundation models , training data influences both capabilities and alignment - what systems can do and how they do it. Low quality or harmful training data could lead to misaligned or dangerous models ("garbage in, garbage out"), while carefully curated datasets might help promote safer and more reliable behavior (Longpre et al., 2024; Marcucci et al., 2023).
How well does data meet our governance target criteria? Data as a governance target presents a mixed picture when evaluated against our key criteria. Let's look at each:
What are the key data governance concerns? Several aspects of data require careful governance to promote safe AI development:
How does data governance fit into overall AI governance? Even with strong governance frameworks, alternative data sources or synthetic data generation could potentially circumvent restrictions. Additionally, many concerning capabilities might emerge from seemingly innocuous training data through unexpected interactions or emergent behaviors. While data governance remains important and worthy of deeper exploration, other governance targets may offer more direct governance over frontier AI development in the near term. This is why in the main text we focused primarily on compute governance, which provides more concrete control points through its physical and concentrated nature.
Was this section useful?
Thank you for your feedback
Your input helps improve the Atlas.