LGOct 4, 2021

Efficient Identification of Butterfly Sparse Matrix Factorizations

arXiv:2110.01230v99 citations
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for uniquely identifying butterfly matrix factorizations, which is incremental but addresses a specific bottleneck in fast transforms and neural network compression.

The paper tackles the problem of uniquely factorizing matrices with butterfly structure into sparse factors, proving that any N×N butterfly matrix admits an essentially unique factorization into J factors (where N=2^J) and can be recovered via a hierarchical method. This enables efficient computation with O(N²) cost for factorization and O(N log N) for matrix-vector multiplication, applicable to transforms like Hadamard or Fourier and potentially compressing deep neural networks.

Fast transforms correspond to factorizations of the form $\mathbf{Z} = \mathbf{X}^{(1)} \ldots \mathbf{X}^{(J)}$, where each factor $ \mathbf{X}^{(\ell)}$ is sparse and possibly structured. This paper investigates essential uniqueness of such factorizations, i.e., uniqueness up to unavoidable scaling ambiguities. Our main contribution is to prove that any $N \times N$ matrix having the so-called butterfly structure admits an essentially unique factorization into $J$ butterfly factors (where $N = 2^{J}$), and that the factors can be recovered by a hierarchical factorization method, which consists in recursively factorizing the considered matrix into two factors. This hierarchical identifiability property relies on a simple identifiability condition in the two-layer and fixed-support setting. This approach contrasts with existing ones that fit the product of butterfly factors to a given matrix via gradient descent. The proposed method can be applied in particular to retrieve the factorization of the Hadamard or the discrete Fourier transform matrices of size $N=2^J$. Computing such factorizations costs $\mathcal{O}(N^{2})$, which is of the order of dense matrix-vector multiplication, while the obtained factorizations enable fast $\mathcal{O}(N \log N)$ matrix-vector multiplications and have the potential to be applied to compress deep neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes