Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach
This work addresses the challenge of automating the search for efficient matrix multiplication algorithms, which is incremental as it builds on existing tensor decomposition methods using neural networks.
The paper tackled the problem of discovering fast matrix multiplication algorithms by designing a neural network, StrassenNet, which successfully recovered Strassen's optimal algorithm for 2x2 matrices and suggested that 23 might be the smallest effective rank for 3x3 matrices, with models achieving significantly lower validation error at rank 23 compared to lower ranks.
Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for $2\times 2$ multiplication. Across many independent runs the network always converges to a rank-$7$ tensor, thus numerically recovering Strassen's optimal algorithm. We then train the same architecture on $3\times 3$ multiplication with rank $r\in\{19,\dots,23\}$. Our experiments reveal a clear numerical threshold: models with $r=23$ attain significantly lower validation error than those with $r\le 22$, suggesting that $r=23$ could actually be the smallest effective rank of the matrix multiplication tensor $3\times 3$. We also sketch an extension of the method to border-rank decompositions via an $\varepsilon$--parametrisation and report preliminary results consistent with the known bounds for the border rank of the $3\times 3$ matrix--multiplication tensor.