Towards Learning Convolutions from Scratch
This addresses the challenge of automating architecture design in machine learning, though it is incremental as it builds on existing methods like LASSO.
The paper tackles the problem of learning convolution-like structures from scratch to reduce expert bias in computer vision architectures, proposing β-LASSO which achieves state-of-the-art accuracies for fully-connected networks on CIFAR-10 (85.19%), CIFAR-100 (59.56%), and SVHN (94.07%).
Convolution is one of the most essential components of architectures used in computer vision. As machine learning moves towards reducing the expert bias and learning it from data, a natural next step seems to be learning convolution-like structures from scratch. This, however, has proven elusive. For example, current state-of-the-art architecture search algorithms use convolution as one of the existing modules rather than learning it from data. In an attempt to understand the inductive bias that gives rise to convolutions, we investigate minimum description length as a guiding principle and show that in some settings, it can indeed be indicative of the performance of architectures. To find architectures with small description length, we propose $β$-LASSO, a simple variant of LASSO algorithm that, when applied on fully-connected networks for image classification tasks, learns architectures with local connections and achieves state-of-the-art accuracies for training fully-connected nets on CIFAR-10 (85.19%), CIFAR-100 (59.56%) and SVHN (94.07%) bridging the gap between fully-connected and convolutional nets.