CVMar 29, 2025Code
Action Recognition in Real-World Ambient Assisted Living EnvironmentVincent Gbouna Zakka, Zhuangzhuang Dai, Luis J. Manso
The growing ageing population and their preference to maintain independence by living in their own homes require proactive strategies to ensure safety and support. Ambient Assisted Living (AAL) technologies have emerged to facilitate ageing in place by offering continuous monitoring and assistance within the home. Within AAL technologies, action recognition plays a crucial role in interpreting human activities and detecting incidents like falls, mobility decline, or unusual behaviours that may signal worsening health conditions. However, action recognition in practical AAL applications presents challenges, including occlusions, noisy data, and the need for real-time performance. While advancements have been made in accuracy, robustness to noise, and computation efficiency, achieving a balance among them all remains a challenge. To address this challenge, this paper introduces the Robust and Efficient Temporal Convolution network (RE-TCN), which comprises three main elements: Adaptive Temporal Weighting (ATW), Depthwise Separable Convolutions (DSC), and data augmentation techniques. These elements aim to enhance the model's accuracy, robustness against noise and occlusion, and computational efficiency within real-world AAL contexts. RE-TCN outperforms existing models in terms of accuracy, noise and occlusion robustness, and has been validated on four benchmark datasets: NTU RGB+D 60, Northwestern-UCLA, SHREC'17, and DHG-14/28. The code is publicly available at: https://github.com/Gbouna/RE-TCN
CVJun 30, 2025Code
GazeTarget360: Towards Gaze Target Estimation in 360-Degree for Robot PerceptionZhuangzhuang Dai, Vincent Gbouna Zakka, Luis J. Manso et al.
Enabling robots to understand human gaze target is a crucial step to allow capabilities in downstream tasks, for example, attention estimation and movement anticipation in real-world human-robot interactions. Prior works have addressed the in-frame target localization problem with data-driven approaches by carefully removing out-of-frame samples. Vision-based gaze estimation methods, such as OpenFace, do not effectively absorb background information in images and cannot predict gaze target in situations where subjects look away from the camera. In this work, we propose a system to address the problem of 360-degree gaze target estimation from an image in generalized visual scenes. The system, named GazeTarget360, integrates conditional inference engines of an eye-contact detector, a pre-trained vision encoder, and a multi-scale-fusion decoder. Cross validation results show that GazeTarget360 can produce accurate and reliable gaze target predictions in unseen scenarios. This makes a first-of-its-kind system to predict gaze targets from realistic camera footage which is highly efficient and deployable. Our source code is made publicly available at: https://github.com/zdai257/DisengageNet.
CVApr 3, 2025Code
Multi-Head Adaptive Graph Convolution Network for Sparse Point Cloud-Based Human Activity RecognitionVincent Gbouna Zakka, Luis J. Manso, Zhuangzhuang Dai
Human activity recognition is increasingly vital for supporting independent living, particularly for the elderly and those in need of assistance. Domestic service robots with monitoring capabilities can enhance safety and provide essential support. Although image-based methods have advanced considerably in the past decade, their adoption remains limited by concerns over privacy and sensitivity to low-light or dark conditions. As an alternative, millimetre-wave (mmWave) radar can produce point cloud data which is privacy-preserving. However, processing the sparse and noisy point clouds remains a long-standing challenge. While graph-based methods and attention mechanisms show promise, they predominantly rely on "fixed" kernels; kernels that are applied uniformly across all neighbourhoods, highlighting the need for adaptive approaches that can dynamically adjust their kernels to the specific geometry of each local neighbourhood in point cloud data. To overcome this limitation, we introduce an adaptive approach within the graph convolutional framework. Instead of a single shared weight function, our Multi-Head Adaptive Kernel (MAK) module generates multiple dynamic kernels, each capturing different aspects of the local feature space. By progressively refining local features while maintaining global spatial context, our method enables convolution kernels to adapt to varying local features. Experimental results on benchmark datasets confirm the effectiveness of our approach, achieving state-of-the-art performance in human activity recognition. Our source code is made publicly available at: https://github.com/Gbouna/MAK-GCN
CVAug 14, 2025
Enhanced Sparse Point Cloud Data Processing for Privacy-aware Human Action RecognitionMaimunatu Tunau, Vincent Gbouna Zakka, Zhuangzhuang Dai
Human Action Recognition (HAR) plays a crucial role in healthcare, fitness tracking, and ambient assisted living technologies. While traditional vision based HAR systems are effective, they pose privacy concerns. mmWave radar sensors offer a privacy preserving alternative but present challenges due to the sparse and noisy nature of their point cloud data. In the literature, three primary data processing methods: Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the Hungarian Algorithm, and Kalman Filtering have been widely used to improve the quality and continuity of radar data. However, a comprehensive evaluation of these methods, both individually and in combination, remains lacking. This paper addresses that gap by conducting a detailed performance analysis of the three methods using the MiliPoint dataset. We evaluate each method individually, all possible pairwise combinations, and the combination of all three, assessing both recognition accuracy and computational cost. Furthermore, we propose targeted enhancements to the individual methods aimed at improving accuracy. Our results provide crucial insights into the strengths and trade-offs of each method and their integrations, guiding future work on mmWave based HAR systems
CVMay 13, 2025
A computer vision-based model for occupancy detection using low-resolution thermal imagesXue Cui, Vincent Gbouna Zakka, Minhyun Lee
Occupancy plays an essential role in influencing the energy consumption and operation of heating, ventilation, and air conditioning (HVAC) systems. Traditional HVAC typically operate on fixed schedules without considering occupancy. Advanced occupant-centric control (OCC) adopted occupancy status in regulating HVAC operations. RGB images combined with computer vision (CV) techniques are widely used for occupancy detection, however, the detailed facial and body features they capture raise significant privacy concerns. Low-resolution thermal images offer a non-invasive solution that mitigates privacy issues. The study developed an occupancy detection model utilizing low-resolution thermal images and CV techniques, where transfer learning was applied to fine-tune the You Only Look Once version 5 (YOLOv5) model. The developed model ultimately achieved satisfactory performance, with precision, recall, mAP50, and mAP50 values approaching 1.000. The contributions of this model lie not only in mitigating privacy concerns but also in reducing computing resource demands.