CV LGMay 6, 2021

Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?

arXiv:2105.02498v213.539 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a problem for researchers and practitioners in computer vision using CNNs, as it improves classification performance by optimizing gradient computation in covariance pooling, though it is incremental by refining existing methods.

The paper tackles the performance gap between approximate matrix square root (Newton-Schulz iteration) and accurate SVD in global covariance pooling for CNNs, finding that approximate methods outperform due to data precision and gradient smoothness issues, and proposes a hybrid training protocol and a new meta-layer using SVD forward and Padé Approximants backward, achieving state-of-the-art results on large-scale and fine-grained datasets.

Global covariance pooling (GCP) aims at exploiting the second-order statistics of the convolutional feature. Its effectiveness has been demonstrated in boosting the classification performance of Convolutional Neural Networks (CNNs). Singular Value Decomposition (SVD) is used in GCP to compute the matrix square root. However, the approximate matrix square root calculated using Newton-Schulz iteration \cite{li2018towards} outperforms the accurate one computed via SVD \cite{li2017second}. We empirically analyze the reason behind the performance gap from the perspectives of data precision and gradient smoothness. Various remedies for computing smooth SVD gradients are investigated. Based on our observation and analyses, a hybrid training protocol is proposed for SVD-based GCP meta-layers such that competitive performances can be achieved against Newton-Schulz iteration. Moreover, we propose a new GCP meta-layer that uses SVD in the forward pass, and Padé Approximants in the backward propagation to compute the gradients. The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.

View on arXiv PDF Code

Similar