Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics
This work addresses the challenge of compositional generalization in machine learning, providing theoretical insights into learning dynamics, but it is incremental as it builds on existing concepts of simplicity and bias.
The paper tackles the problem of understanding when neural networks learn compositional mappings, showing that these mappings are the simplest bijections in terms of coding length, which explains good generalization, and that simplicity bias is intrinsic to gradient descent training.
Obtaining compositional mappings is important for the model to generalize well compositionally. To better understand when and how to encourage the model to learn such mappings, we study their uniqueness through different perspectives. Specifically, we first show that the compositional mappings are the simplest bijections through the lens of coding length (i.e., an upper bound of their Kolmogorov complexity). This property explains why models having such mappings can generalize well. We further show that the simplicity bias is usually an intrinsic property of neural network training via gradient descent. That partially explains why some models spontaneously generalize well when they are trained appropriately.