A Lightweight and Gradient-Stable Neural Layer
This work addresses resource constraints and training stability for deploying neural networks, but it appears incremental as it builds on existing layer designs with specific modifications.
The authors tackled the problem of resource inefficiency and gradient instability in neural networks by proposing a lightweight neural layer architecture called Han-layer, which reduces parameters from O(d^2) to O(d) and ensures gradient stability through orthogonal Jacobians, maintaining or improving generalization performance in experiments.
To enhance resource efficiency and model deployability of neural networks, we propose a neural-layer architecture based on Householder weighting and absolute-value activating, called Householder-absolute neural layer or simply Han-layer. Compared to a fully connected layer with $d$-neurons and $d$ outputs, a Han-layer reduces the number of parameters and the corresponding computational complexity from $O(d^2)$ to $O(d)$. {The Han-layer structure guarantees that the Jacobian of the layer function is always orthogonal, thus ensuring gradient stability (i.e., free of gradient vanishing or exploding issues) for any Han-layer sub-networks.} Extensive numerical experiments show that one can strategically use Han-layers to replace fully connected (FC) layers, reducing the number of model parameters while maintaining or even improving the generalization performance. We will also showcase the capabilities of the Han-layer architecture on a few small stylized models, and discuss its current limitations.