Parameter Symmetry Potentially Unifies Deep Learning Theory
This research addresses the problem of understanding the mechanisms behind neural networks and language models for the machine learning community, and is incremental in the sense that it builds upon existing theories.
This research tackles the problem of fragmented theories in deep learning by proposing parameter symmetry as a unifying mechanism, potentially leading to a unified understanding of hierarchical learning behavior in AI models. The result is a position paper that advocates for the crucial role of parameter symmetries in unifying existing theories.
The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts akin to phase transitions observed in physical systems. While these phenomena hold promise for uncovering the mechanisms behind neural networks and language models, existing theories remain fragmented, addressing specific cases. In this position paper, we advocate for the crucial role of the research direction of parameter symmetries in unifying these fragmented theories. This position is founded on a centralizing hypothesis for this direction: parameter symmetry breaking and restoration are the unifying mechanisms underlying the hierarchical learning behavior of AI models. We synthesize prior observations and theories to argue that this direction of research could lead to a unified understanding of three distinct hierarchies in neural networks: learning dynamics, model complexity, and representation formation. By connecting these hierarchies, our position paper elevates symmetry -- a cornerstone of theoretical physics -- to become a potential fundamental principle in modern AI.