Signal-to-Noise Ratio and Sample Size Govern Representational Alignment in Neural Networks

arXiv:2605.2697361.4

AI Analysis

This work provides insights for practitioners into how data quality and quantity influence neural network representations, decoupling alignment from generalization.

The paper investigates how signal-to-noise ratio (SNR) and training sample size affect representational alignment in neural networks, finding that alignment varies monotonically with SNR but non-monotonically with sample size, with minimum alignment near the interpolation threshold, and that stronger alignment does not necessarily improve generalization.

Neural networks are known to develop latent representations that are $aligned$, namely structurally similar across networks trained with different architectures, training protocols, or training datasets. We study this phenomenon in a controlled setting, where we train an ensemble of networks on regression and classification tasks using training sets perturbed by independent realizations of a noise process. We show that the signal-to-noise ratio (SNR) and the training sample size influence the alignment in qualitatively similar ways in networks trained on real-world datasets and in an extremely simple $linear$ network with a single hidden layer, for which the alignment can be estimated analytically. Across linear and nonlinear networks, regression and classification tasks, and both synthetic and real-world data, we consistently observe that alignment varies monotonically with SNR but non-monotonically with training sample size. In particular, the alignment is minimized near the interpolation threshold, and a stronger alignment does not necessarily correspond to better generalization error. These findings reveal a non-trivial dependence of alignment on data quality and quantity, decoupled from generalization performance.

View on arXiv PDF

Similar