Multi-Glimpse LSTM with Color-Depth Feature Fusion for Human Detection
This work addresses human detection for applications like robotics or surveillance, but it is incremental as it adapts existing LSTM techniques to a specific domain.
The paper tackles human detection using RGB-D data by proposing a Multi-Glimpse LSTM network with a feature fusion strategy, achieving superior performance on two public datasets.
With the development of depth cameras such as Kinect and Intel Realsense, RGB-D based human detection receives continuous research attention due to its usage in a variety of applications. In this paper, we propose a new Multi-Glimpse LSTM (MG-LSTM) network, in which multi-scale contextual information is sequentially integrated to promote the human detection performance. Furthermore, we propose a feature fusion strategy based on our MG-LSTM network to better incorporate the RGB and depth information. To the best of our knowledge, this is the first attempt to utilize LSTM structure for RGB-D based human detection. Our method achieves superior performance on two publicly available datasets.