U(1) Symmetry-breaking Observed in Generic CNN Bottleneck Layers
This work addresses a foundational problem in understanding CNN mechanisms for researchers in machine learning and theoretical physics, though it appears incremental in its application to existing models.
The paper tackles the problem of linking CNN information propagation to physical symmetries, reporting that modeling bottleneck layers with a U(1) symmetry-breaking bias improves classification accuracy across tasks, including ImageNet, without requiring training or tuning in initial tests.
We report on a novel model linking deep convolutional neural networks (CNN) to biological vision and fundamental particle physics. Information propagation in a CNN is modeled via an analogy to an optical system, where information is concentrated near a bottleneck where the 2D spatial resolution collapses about a focal point $1\times 1=1$. A 3D space $(x,y,t)$ is defined by $(x,y)$ coordinates in the image plane and CNN layer $t$, where a principal ray $(0,0,t)$ runs in the direction of information propagation through both the optical axis and the image center pixel located at $(x,y)=(0,0)$, about which the sharpest possible spatial focus is limited to a circle of confusion in the image plane. Our novel insight is to model the principal optical ray $(0,0,t)$ as geometrically equivalent to the medial vector in the positive orthant $I(x,y) \in R^{N+}$ of a $N$-channel activation space, e.g. along the greyscale (or luminance) vector $(t,t,t)$ in $RGB$ colour space. Information is thus concentrated into an energy potential $E(x,y,t)=\|I(x,y,t)\|^2$, which, particularly for bottleneck layers $t$ of generic CNNs, is highly concentrated and symmetric about the spatial origin $(0,0,t)$ and exhibits the well-known "Sombrero" potential of the boson particle. This symmetry is broken in classification, where bottleneck layers of generic pre-trained CNN models exhibit a consistent class-specific bias towards an angle $θ\in U(1)$ defined simultaneously in the image plane and in activation feature space. Initial observations validate our hypothesis from generic pre-trained CNN activation maps and a bare-bones memory-based classification scheme, with no training or tuning. Training from scratch using combined one-hot $+ U(1)$ loss improves classification for all tasks tested including ImageNet.