Optimization on Submanifolds of Convolution Kernels in CNNs
This work provides a theoretical foundation for kernel normalization methods in CNNs, which is incremental but addresses a gap in understanding for researchers and practitioners in deep learning.
The paper tackled the lack of theoretical understanding of kernel normalization methods in CNNs by developing a geometric framework to analyze their effects on optimization geometry, and proposed a new SGD algorithm that ensures convergence and achieves state-of-the-art performance on major image classification benchmarks.
Kernel normalization methods have been employed to improve robustness of optimization methods to reparametrization of convolution kernels, covariate shift, and to accelerate training of Convolutional Neural Networks (CNNs). However, our understanding of theoretical properties of these methods has lagged behind their success in applications. We develop a geometric framework to elucidate underlying mechanisms of a diverse range of kernel normalization methods. Our framework enables us to expound and identify geometry of space of normalized kernels. We analyze and delineate how state-of-the-art kernel normalization methods affect the geometry of search spaces of the stochastic gradient descent (SGD) algorithms in CNNs. Following our theoretical results, we propose a SGD algorithm with assurance of almost sure convergence of the methods to a solution at single minimum of classification loss of CNNs. Experimental results show that the proposed method achieves state-of-the-art performance for major image classification benchmarks with CNNs.