Fusing Deep Convolutional Networks for Large Scale Visual Concept Classification
This work addresses efficiency and accuracy issues for practitioners in computer vision dealing with big data, though it appears incremental as it builds on existing CNN architectures.
The paper tackles the challenge of efficiently achieving state-of-the-art visual concept classification on large-scale datasets by proposing fusion mechanisms for convolutional neural networks, resulting in top benchmark performance with reduced computational costs and without extensive data augmentation.
Deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition. In this study, we investigate various aspects of convolutional neural networks (CNNs) from the big data perspective. We analyze recent studies and different network architectures both in terms of running time and accuracy. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the art results on benchmark datasets while keeping computational costs at a lower level. Another contribution of our paper is that these state-of-the-art results can be reached without using extensive data augmentation techniques.