RO CVJan 18, 2017

Action Recognition: From Static Datasets to Moving Robots

Fahimeh Rezazadegan, Sareh Shirazi, Ben Upcroft, Michael Milford

arXiv:1701.04925v144 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of deploying action recognition systems on autonomous robots in real-world environments, though it is incremental as it builds on existing ConvNet frameworks.

The paper tackles the problem of adapting deep learning models for human action recognition from static datasets to moving robots by proposing a method that generates action region proposals to handle background cues and camera motion, achieving state-of-the-art or better performance on benchmarks with higher success rates in abnormal behavior detection.

Deep learning models have achieved state-of-the- art performance in recognizing human activities, but often rely on utilizing background cues present in typical computer vision datasets that predominantly have a stationary camera. If these models are to be employed by autonomous robots in real world environments, they must be adapted to perform independently of background cues and camera motion effects. To address these challenges, we propose a new method that firstly generates generic action region proposals with good potential to locate one human action in unconstrained videos regardless of camera motion and then uses action proposals to extract and classify effective shape and motion features by a ConvNet framework. In a range of experiments, we demonstrate that by actively proposing action regions during both training and testing, state-of-the-art or better performance is achieved on benchmarks. We show the outperformance of our approach compared to the state-of-the-art in two new datasets; one emphasizes on irrelevant background, the other highlights the camera motion. We also validate our action recognition method in an abnormal behavior detection scenario to improve workplace safety. The results verify a higher success rate for our method due to the ability of our system to recognize human actions regardless of environment and camera motion.

View on arXiv PDF

Similar