Deep Convolutional Networks are Hierarchical Kernel Machines
This provides a theoretical foundation for understanding DCNs as kernel machines, which is incremental but clarifies their properties for machine learning researchers.
The paper shows that deep convolutional networks (DCNs) with rectifying nonlinearities are equivalent to hierarchical kernel machines, implying they can compute selective and invariant representations while minimizing memory requirements.
In i-theory a typical layer of a hierarchical architecture consists of HW modules pooling the dot products of the inputs to the layer with the transformations of a few templates under a group. Such layers include as special cases the convolutional layers of Deep Convolutional Networks (DCNs) as well as the non-convolutional layers (when the group contains only the identity). Rectifying nonlinearities -- which are used by present-day DCNs -- are one of the several nonlinearities admitted by i-theory for the HW module. We discuss here the equivalence between group averages of linear combinations of rectifying nonlinearities and an associated kernel. This property implies that present-day DCNs can be exactly equivalent to a hierarchy of kernel machines with pooling and non-pooling layers. Finally, we describe a conjecture for theoretically understanding hierarchies of such modules. A main consequence of the conjecture is that hierarchies of trained HW modules minimize memory requirements while computing a selective and invariant representation.