CVJan 13, 2020

Towards Interpretable and Robust Hand Detection via Pixel-wise Prediction

Dan Liu, Libo Zhang, Tiejian Luo, Lili Tao, Yanjun Wu

arXiv:2001.04163v15.818 citations

Originality Incremental advance

AI Analysis

This work addresses interpretability and robustness issues in hand detection for applications like human-computer interaction, though it is incremental as it builds on existing CNN methods.

The paper tackles the lack of interpretability in CNN-based hand detection by proposing a novel model that introduces pixel-level detection and explainable feature fusion, achieving competitive accuracy on VIVA and Oxford datasets with faster training times, saving over 10 hours.

The lack of interpretability of existing CNN-based hand detection methods makes it difficult to understand the rationale behind their predictions. In this paper, we propose a novel neural network model, which introduces interpretability into hand detection for the first time. The main improvements include: (1) Detect hands at pixel level to explain what pixels are the basis for its decision and improve transparency of the model. (2) The explainable Highlight Feature Fusion block highlights distinctive features among multiple layers and learns discriminative ones to gain robust performance. (3) We introduce a transparent representation, the rotation map, to learn rotation features instead of complex and non-transparent rotation and derotation layers. (4) Auxiliary supervision accelerates the training process, which saves more than 10 hours in our experiments. Experimental results on the VIVA and Oxford hand detection and tracking datasets show competitive accuracy of our method compared with state-of-the-art methods with higher speed.

View on arXiv PDF

Similar