Signal-to-Noise Ratio and Sample Size Govern Representational Alignment in Neural Networks
This work provides insights for practitioners into how data quality and quantity influence neural network representations, decoupling alignment from generalization.
The paper investigates how signal-to-noise ratio (SNR) and training sample size affect representational alignment in neural networks, finding that alignment varies monotonically with SNR but non-monotonically with sample size, with minimum alignment near the interpolation threshold, and that stronger alignment does not necessarily improve generalization.
Neural networks are known to develop latent representations that are $aligned$, namely structurally similar across networks trained with different architectures, training protocols, or training datasets. We study this phenomenon in a controlled setting, where we train an ensemble of networks on regression and classification tasks using training sets perturbed by independent realizations of a noise process. We show that the signal-to-noise ratio (SNR) and the training sample size influence the alignment in qualitatively similar ways in networks trained on real-world datasets and in an extremely simple $linear$ network with a single hidden layer, for which the alignment can be estimated analytically. Across linear and nonlinear networks, regression and classification tasks, and both synthetic and real-world data, we consistently observe that alignment varies monotonically with SNR but non-monotonically with training sample size. In particular, the alignment is minimized near the interpolation threshold, and a stronger alignment does not necessarily correspond to better generalization error. These findings reveal a non-trivial dependence of alignment on data quality and quantity, decoupled from generalization performance.