ViS-HuD: Using Visual Saliency to Improve Human Detection with Convolutional Neural Networks
This work addresses human detection for computer vision applications, but it is incremental as it combines existing saliency and detection methods.
The paper tackled human detection in still images by proposing ViS-HuD, which uses visual saliency maps to enhance input for a CNN, achieving 91.4% accuracy on the Penn Fudan Dataset and a 53% average miss-rate on the TUD-Brussels benchmark.
The paper presents a technique to improve human detection in still images using deep learning. Our novel method, ViS-HuD, computes visual saliency map from the image. Then the input image is multiplied by the map and product is fed to the Convolutional Neural Network (CNN) which detects humans in the image. A visual saliency map is generated using ML-Net and human detection is carried out using DetectNet. ML-Net is pre-trained on SALICON while, DetectNet is pre-trained on ImageNet database for visual saliency detection and image classification respectively. The CNNs of ViS-HuD were trained on two challenging databases - Penn Fudan and TUD-Brussels Benchmark. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on Penn Fudan Dataset with 91.4% human detection accuracy and it achieves average miss-rate of 53% on the TUDBrussels benchmark.