34.5CVApr 30
MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave VideoXijia Wei, Yuan Fang, Kevin Chetty et al.
Millimetre-wave (mmWave) radar offers a more privacy-preserving alternative to RGB-based human pose estimation. However, existing methods typically rely on pre-extracted intermediate representations such as sparse point clouds or spectrogram images, where the rich spatiotemporal information naturally present in radar video streams is discarded for model learning, while such signal processing adds system complexity. In addition, existing solutions are mainly conducted in an end-to-end supervised manner without leveraging unlabelled raw video streams to learn generalized representations. In this study, we present MAEPose, a masked autoencoding-based human pose estimation approach that operates directly on mmWave spectrogram videos. MAEPose learns spatiotemporal motion-aware generalized representations from unlabelled radar video, and leverages its heatmap decoder for multi-frame pose estimation predictions. We evaluate it across three datasets based on leave-one-person-out cross-validation with rigorous statistical testing. MAEPose consistently outperforms state-of-the-art baselines by up to 22.1% in MPJPE p<0.05, and maintains robust accuracy under zero-shot bystander interference with only a 6.5% error increase. Ablation studies confirm that both the pre-training and the heatmap decoder contribute substantially, while modality analysis indicates that leveraging Range-Doppler video as input achieves better pose estimation performance than Range-Azimuth or their fusion, with lower computational cost.
LGJan 11, 2025
Boundary-enhanced time series data imputation with long-term dependency diffusion modelsChunjing Xiao, Xue Jiang, Xianghe Du et al.
Data imputation is crucial for addressing challenges posed by missing values in multivariate time series data across various fields, such as healthcare, traffic, and economics, and has garnered significant attention. Among various methods, diffusion model-based approaches show notable performance improvements. However, existing methods often cause disharmonious boundaries between missing and known regions and overlook long-range dependencies in missing data estimation, leading to suboptimal results. To address these issues, we propose a Diffusion-based time Series Data Imputation (DSDI) framework. We develop a weight-reducing injection strategy that incorporates the predicted values of missing points with reducing weights into the reverse diffusion process to mitigate boundary inconsistencies. Further, we introduce a multi-scale S4-based U-Net, which combines hierarchical information from different levels via multi-resolution integration to capture long-term dependencies. Experimental results demonstrate that our model outperforms existing imputation methods.
CVJan 11, 2022
MDPose: Human Skeletal Motion Reconstruction Using WiFi Micro-Doppler SignaturesChong Tang, Wenda Li, Shelly Vishwakarma et al.
Motion tracking systems based on optical sensors typically often suffer from issues, such as poor lighting conditions, occlusion, limited coverage, and may raise privacy concerns. More recently, radio frequency (RF)-based approaches using commercial WiFi devices have emerged which offer low-cost ubiquitous sensing whilst preserving privacy. However, the output of an RF sensing system, such as Range-Doppler spectrograms, cannot represent human motion intuitively and usually requires further processing. In this study, MDPose, a novel framework for human skeletal motion reconstruction based on WiFi micro-Doppler signatures, is proposed. It provides an effective solution to track human activities by reconstructing a skeleton model with 17 key points, which can assist with the interpretation of conventional RF sensing outputs in a more understandable way. Specifically, MDPose has various incremental stages to gradually address a series of challenges: First, a denoising algorithm is implemented to remove any unwanted noise that may affect the feature extraction and enhance weak Doppler signatures. Secondly, the convolutional neural network (CNN)-recurrent neural network (RNN) architecture is applied to learn temporal-spatial dependency from clean micro-Doppler signatures and restore key points' velocity information. Finally, a pose optimising mechanism is employed to estimate the initial state of the skeleton and to limit the increase of error. We have conducted comprehensive tests in a variety of environments using numerous subjects with a single receiver radar system to demonstrate the performance of MDPose, and report 29.4mm mean absolute error over all key points positions, which outperforms state-of-the-art RF-based pose estimation systems.
SPJul 9, 2021
FMNet: Latent Feature-wise Mapping Network for Cleaning up Noisy Micro-Doppler SpectrogramChong Tang, Wenda Li, Shelly Vishwakarma et al.
Micro-Doppler signatures contain considerable information about target dynamics. However, the radar sensing systems are easily affected by noisy surroundings, resulting in uninterpretable motion patterns on the micro-Doppler spectrogram. Meanwhile, radar returns often suffer from multipath, clutter and interference. These issues lead to difficulty in, for example motion feature extraction, activity classification using micro Doppler signatures ($μ$-DS), etc. In this paper, we propose a latent feature-wise mapping strategy, called Feature Mapping Network (FMNet), to transform measured spectrograms so that they more closely resemble the output from a simulation under the same conditions. Based on measured spectrogram and the matched simulated data, our framework contains three parts: an Encoder which is used to extract latent representations/features, a Decoder outputs reconstructed spectrogram according to the latent features, and a Discriminator minimizes the distance of latent features of measured and simulated data. We demonstrate the FMNet with six activities data and two experimental scenarios, and final results show strong enhanced patterns and can keep actual motion information to the greatest extent. On the other hand, we also propose a novel idea which trains a classifier with only simulated data and predicts new measured samples after cleaning them up with the FMNet. From final classification results, we can see significant improvements.
CVNov 13, 2018
Home Activity Monitoring using Low Resolution Infrared SensorLili Tao, Timothy Volonakis, Bo Tan et al.
Action monitoring in a home environment provides important information for health monitoring and may serve as input into a smart home environment. Visual analysis using cameras can recognise actions in a complex scene, such as someones living room. However, although there the huge potential benefits and importance, specifically for health, cameras are not widely accepted because of privacy concerns. This paper recognises human activities using a sensor that retains privacy. The sensor is not only different by being thermal, but it is also of low resolution: 8x8 pixels. The combination of the thermal imaging, and the low spatial resolution ensures the privacy of individuals. We present an approach to recognise daily activities using this sensor based on a discrete cosine transform. We evaluate the proposed method on a state-of-the-art dataset and experimentally confirm that our approach outperforms the baseline method. We also introduce a new dataset, and evaluate the method on it. Here we show that the sensor is considered better at detecting the occurrence of falls and Activities of Daily Living. Our method achieves an overall accuracy of 87.50% across 7 activities with a fall detection sensitivity of 100% and specificity of 99.21%.