Multi-Scale Dual-Branch Fully Convolutional Network for Hand Parsing
This work addresses hand parsing, a challenging computer vision task with applications in human-computer interaction, but it appears incremental as it builds on existing FCN methods with specific architectural and loss modifications.
The paper tackled hand parsing by proposing a Multi-Scale Dual-Branch Fully Convolutional Network (MSDB-FCN) with a novel loss function, achieving state-of-the-art performance on the RHD-PARSING dataset.
Recently, fully convolutional neural networks (FCNs) have shown significant performance in image parsing, including scene parsing and object parsing. Different from generic object parsing tasks, hand parsing is more challenging due to small size, complex structure, heavy self-occlusion and ambiguous texture problems. In this paper, we propose a novel parsing framework, Multi-Scale Dual-Branch Fully Convolutional Network (MSDB-FCN), for hand parsing tasks. Our network employs a Dual-Branch architecture to extract features of hand area, paying attention on the hand itself. These features are used to generate multi-scale features with pyramid pooling strategy. In order to better encode multi-scale features, we design a Deconvolution and Bilinear Interpolation Block (DB-Block) for upsampling and merging the features of different scales. To address data imbalance, which is a common problem in many computer vision tasks as well as hand parsing tasks, we propose a generalization of Focal Loss, namely Multi-Class Balanced Focal Loss, to tackle data imbalance in multi-class classification. Extensive experiments on RHD-PARSING dataset demonstrate that our MSDB-FCN has achieved the state-of-the-art performance for hand parsing.