LGFeb 8, 2025

You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, George Wang, Liam Carroll, Daniel Murfet

arXiv:2502.05475v122.613 citationsh-index: 8

Originality Incremental advance

AI Analysis

This research tackles the problem of ensuring the safety of widely deployed generally intelligent systems, which is significant for the broader AI community.

The authors argue that understanding how data shapes the structure and generalization of AI models is crucial for AI alignment, as two models with equivalent performance can generalize differently. They propose developing statistical foundations to address this issue.

In this position paper, we argue that understanding the relation between structure in the data distribution and structure in trained models is central to AI alignment. First, we discuss how two neural networks can have equivalent performance on the training set but compute their outputs in essentially different ways and thus generalise differently. For this reason, standard testing and evaluation are insufficient for obtaining assurances of safety for widely deployed generally intelligent systems. We argue that to progress beyond evaluation to a robust mathematical science of AI alignment, we need to develop statistical foundations for an understanding of the relation between structure in the data distribution, internal structure in models, and how these structures underlie generalisation.

View on arXiv PDF

Similar