Approximation and Learning with Deep Convolutional Models: a Kernel Perspective
This work offers theoretical insights into the inductive bias of convolutional networks for researchers in machine learning, though it is incremental as it builds on existing kernel methods.
The paper tackles the problem of understanding why deep convolutional networks perform well on high-dimensional data by analyzing them through kernel methods, showing that hierarchical kernels with convolution and pooling layers achieve good empirical performance on vision datasets and provide generalization bounds with improved sample complexity for functions with spatial regularities.
The empirical success of deep convolutional networks on tasks involving high-dimensional data such as images or audio suggests that they can efficiently approximate certain functions that are well-suited for such tasks. In this paper, we study this through the lens of kernel methods, by considering simple hierarchical kernels with two or three convolution and pooling layers, inspired by convolutional kernel networks. These achieve good empirical performance on standard vision datasets, while providing a precise description of their functional space that yields new insights on their inductive bias. We show that the RKHS consists of additive models of interaction terms between patches, and that its norm encourages spatial similarities between these terms through pooling layers. We then provide generalization bounds which illustrate how pooling and patches yield improved sample complexity guarantees when the target function presents such regularities.