Characterising the Inductive Biases of Neural Networks on Boolean Data
This work provides a foundational, analytically tractable framework for characterizing inductive biases in neural networks, which is incremental but offers detailed insights into generalization mechanisms.
The paper tackled the problem of understanding neural networks' inductive biases by linking them to training dynamics and generalization, using Boolean functions as a tractable case study, and demonstrated predictable dynamics and interpretable feature emergence under a Monte Carlo algorithm.
Deep neural networks are renowned for their ability to generalise well across diverse tasks, even when heavily overparameterized. Existing works offer only partial explanations (for example, the NTK-based task-model alignment explanation neglects feature learning). Here, we provide an end-to-end, analytically tractable case study that links a network's inductive prior, its training dynamics including feature learning, and its eventual generalisation. Specifically, we exploit the one-to-one correspondence between depth-2 discrete fully connected networks and disjunctive normal form (DNF) formulas by training on Boolean functions. Under a Monte Carlo learning algorithm, our model exhibits predictable training dynamics and the emergence of interpretable features. This framework allows us to trace, in detail, how inductive bias and feature formation drive generalisation.