LGAIQUANT-PHMar 10, 2022

projUNN: efficient method for training deep networks with unitary matrices

arXiv:2203.05483v339 citationsh-index: 137
Originality Incremental advance
AI Analysis

This addresses the training bottleneck for researchers and practitioners using unitary matrices in deep learning, though it is incremental as it builds on existing unitary network approaches.

The paper tackles the problem of expensive training when using unitary matrices in deep networks by proposing an efficient method based on rank-k updates, achieving a training runtime of O(kN^2) and matching or exceeding prior benchmarks in recurrent neural networks.

In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-$k$ updates -- or their rank-$k$ approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full $N$-dimensional unitary or orthogonal matrices with a training runtime scaling as $O(kN^2)$. Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting ($k=1$), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. In recurrent neural network settings, projUNN closely matches or exceeds benchmarked results from prior unitary neural networks. Finally, we preliminarily explore projUNN in training orthogonal convolutional neural networks, which are currently unable to outperform state of the art models but can potentially enhance stability and robustness at large depth.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes