Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods
This provides a framework for simplifying convolution analysis and improving efficiency in deep learning, though it appears incremental as it builds on existing tensor network concepts.
The paper tackles the complexity of analyzing convolutions by representing them as tensor networks with einsum notation, enabling simpler derivation of autodiff operations and curvature approximations. This approach accelerates a KFAC variant by up to 4.5x while reducing memory overhead and enables new tensor dropout methods.
Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the transfer of theoretical and algorithmic ideas to convolutions. We simplify convolutions by viewing them as tensor networks (TNs) that allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum. To demonstrate their simplicity and expressiveness, we derive diagrams of various autodiff operations and popular curvature approximations with full hyper-parameter support, batching, channel groups, and generalization to any convolution dimension. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to simplify diagrams before evaluation. Finally, we probe performance. Our TN implementation accelerates a recently-proposed KFAC variant up to 4.5x while removing the standard implementation's memory overhead, and enables new hardware-efficient tensor dropout for approximate backpropagation.