Deep Clustered Convolutional Kernels
This addresses the costly trial-and-error process in architecture design for domain experts, though it is incremental as it builds on existing training methods.
The paper tackles the problem of manually setting neural network architectures by proposing a training algorithm that automatically optimizes architecture through iterative clustering of convolutional kernels, showing improved performance on three vision tasks compared to hand-crafted designs.
Deep neural networks have recently achieved state of the art performance thanks to new training algorithms for rapid parameter estimation and new regularization methods to reduce overfitting. However, in practice the network architecture has to be manually set by domain experts, generally by a costly trial and error procedure, which often accounts for a large portion of the final system performance. We view this as a limitation and propose a novel training algorithm that automatically optimizes network architecture, by progressively increasing model complexity and then eliminating model redundancy by selectively removing parameters at training time. For convolutional neural networks, our method relies on iterative split/merge clustering of convolutional kernels interleaved by stochastic gradient descent. We present a training algorithm and experimental results on three different vision tasks, showing improved performance compared to similarly sized hand-crafted architectures.