Layer-wise training of deep networks using kernel similarity
This work addresses training efficiency and feature representation in deep learning, but it is incremental as it builds on existing layer-wise and kernel-based approaches.
The paper tackles the problem of training deep networks by proposing a layer-wise optimization method for supervised classification, which achieves competitive performance compared to backpropagation on real image datasets.
Deep learning has shown promising results in many machine learning applications. The hierarchical feature representation built by deep networks enable compact and precise encoding of the data. A kernel analysis of the trained deep networks demonstrated that with deeper layers, more simple and more accurate data representations are obtained. In this paper, we propose an approach for layer-wise training of a deep network for the supervised classification task. A transformation matrix of each layer is obtained by solving an optimization aimed at a better representation where a subsequent layer builds its representation on the top of the features produced by a previous layer. We compared the performance of our approach with a DNN trained using back-propagation which has same architecture as ours. Experimental results on the real image datasets demonstrate efficacy of our approach. We also performed kernel analysis of layer representations to validate the claim of better feature encoding.