Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning
This provides a foundational theory for neurosymbolic reasoning, addressing the integration of continuous learning with discrete algebraic structures, which is incremental as it builds on existing mathematical concepts to explain neural network behavior.
The paper tackles the problem of how neural networks can discover symbolic structures through gradient-based training by developing a theoretical framework that models training as Wasserstein gradient flow, showing that under geometric constraints like group invariance, the network transitions to compositional representations with reduced degrees of freedom and establishes data scaling laws for symbolic tasks.
We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics. By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, such as group invariance, the parameter measure $μ_t$ undergoes two concurrent phenomena: (1) a decoupling of the gradient flow into independent optimization trajectories over some potential functions, and (2) a progressive contraction on the degree of freedom. These potentials encode algebraic constraints relevant to the task and act as ring homomorphisms under a commutative semi-ring structure on the measure space. As training progresses, the network transitions from a high-dimensional exploration to compositional representations that comply with algebraic operations and exhibit a lower degree of freedom. We further establish data scaling laws for realizing symbolic tasks, linking representational capacity to the group invariance that facilitates symbolic solutions. This framework charts a principled foundation for understanding and designing neurosymbolic systems that integrate continuous learning with discrete algebraic reasoning.