Deep Networks with Adaptive Nyström Approximation
This provides a flexible, parameter-efficient approach for small-data scenarios in machine learning, though it is incremental in integrating existing kernel approximation techniques into neural networks.
The paper tackles the problem of combining kernel methods with deep learning by replacing top dense layers in convolutional networks with Nyström-approximated kernel functions, achieving performance comparable to standard architectures on SVHN and CIFAR100 while reducing parameters for small training sets (5-20 samples per class).
Recent work has focused on combining kernel methods and deep learning to exploit the best of the two approaches. Here, we introduce a new architecture of neural networks in which we replace the top dense layers of standard convolutional architectures with an approximation of a kernel function by relying on the Nystr{ö}m approximation. Our approach is easy and highly flexible. It is compatible with any kernel function and it allows exploiting multiple kernels. We show that our architecture has the same performance than standard architecture on datasets like SVHN and CIFAR100. One benefit of the method lies in its limited number of learnable parameters which makes it particularly suited for small training set sizes, e.g. from 5 to 20 samples per class.