ROCVLGDec 6, 2023

From Detection to Action Recognition: An Edge-Based Pipeline for Robot Human Perception

arXiv:2312.03477v12 citationsh-index: 51ICCR
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient human perception in mobile robots for applications like healthcare and assisted living, but it is incremental as it combines existing methods into a pipeline.

The authors tackled the problem of enabling mobile service robots to understand human actions by proposing an end-to-end pipeline that integrates human detection, tracking, and action recognition, operating in near real-time on edge devices with RGB cameras, and demonstrated its efficacy on a new dataset of daily household activities.

Mobile service robots are proving to be increasingly effective in a range of applications, such as healthcare, monitoring Activities of Daily Living (ADL), and facilitating Ambient Assisted Living (AAL). These robots heavily rely on Human Action Recognition (HAR) to interpret human actions and intentions. However, for HAR to function effectively on service robots, it requires prior knowledge of human presence (human detection) and identification of individuals to monitor (human tracking). In this work, we propose an end-to-end pipeline that encompasses the entire process, starting from human detection and tracking, leading to action recognition. The pipeline is designed to operate in near real-time while ensuring all stages of processing are performed on the edge, reducing the need for centralised computation. To identify the most suitable models for our mobile robot, we conducted a series of experiments comparing state-of-the-art solutions based on both their detection performance and efficiency. To evaluate the effectiveness of our proposed pipeline, we proposed a dataset comprising daily household activities. By presenting our findings and analysing the results, we demonstrate the efficacy of our approach in enabling mobile robots to understand and respond to human behaviour in real-world scenarios relying mainly on the data from their RGB cameras.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes