CV AI LGOct 7, 2022

Understanding the Covariance Structure of Convolutional Filters

Asher Trockman, Devin Willmott, J. Zico Kolter

arXiv:2210.03651v113.218 citationsh-index: 71

Originality Incremental advance

AI Analysis

This work addresses the initialization bottleneck in deep learning for convolutional networks, offering a practical improvement that is incremental but with specific gains in efficiency and performance.

The paper tackles the problem of suboptimal random initialization for convolutional neural networks by proposing a learning-free multivariate initialization scheme based on the empirical covariance structure of learned filters. The result is improved model performance, with the new initialization outperforming traditional methods and sometimes matching or exceeding performance even without training the depthwise filters.

Neural network weights are typically initialized at random from univariate distributions, controlling just the variance of individual weights even in highly-structured operations like convolutions. Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions whose learned filters have notable structure; this presents an opportunity to study their empirical covariances. In this work, we first observe that such learned filters have highly-structured covariance matrices, and moreover, we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks of different depths, widths, patch sizes, and kernel sizes, indicating a degree of model-independence to the covariance structure. Motivated by these findings, we then propose a learning-free multivariate initialization scheme for convolutional filters using a simple, closed-form construction of their covariance. Models using our initialization outperform those using traditional univariate initializations, and typically meet or exceed the performance of those initialized from the covariances of learned filters; in some cases, this improvement can be achieved without training the depthwise convolutional filters at all.

View on arXiv PDF

Similar