LG DIS-NN MLMay 19, 2025

Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks

Francesco D'Amico, Dario Bocchi, Matteo Negri

arXiv:2505.13230v21 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work provides insights into the foundations of interpretability in machine learning by explaining scaling laws through implicit bias, which is incremental but offers a richer picture of training dynamics.

The paper tackled the problem of understanding scaling laws in deep learning by analyzing training dynamics, identifying two novel dynamical scaling laws that govern performance evolution based on norm-based complexity measures, and showing consistency across various architectures and datasets with analytical support from a perceptron model.

Scaling laws in deep learning -- empirical power-law relationships linking model performance to resource growth -- have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training. In this work, we describe a richer picture by analyzing the entire training dynamics: we identify two novel \textit{dynamical} scaling laws that govern how performance evolves as function of different norm-based complexity measures. Combined, our new laws recover the well-known scaling for test error at convergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100. Furthermore, we provide analytical support using a single-layer perceptron trained with logistic loss, where we derive the new dynamical scaling laws, and we explain them through the implicit bias induced by gradient-based training.

View on arXiv PDF

Similar