CVDec 26, 2022
RFPose-OT: RF-Based 3D Human Pose Estimation via Optimal Transport TheoryCong Yu, Dongheng Zhang, Zhi Wu et al.
This paper introduces a novel framework, i.e., RFPose-OT, to enable the 3D human pose estimation from Radio Frequency (RF) signals. Different from existing methods that predict human poses from RF signals on the signal level directly, we consider the structure difference between the RF signals and the human poses, propose to transform the RF signals to the pose domain on the feature level based on Optimal Transport (OT) theory, and generate human poses from the transformed features. To evaluate RFPose-OT, we build a radio system and a multi-view camera system to acquire the RF signal data and the ground-truth human poses. The experimental results in basic indoor environment, occlusion indoor environment, and outdoor environment, all demonstrate that RFPose-OT can predict 3D human poses with higher precision than the state-of-the-art methods.
CVMay 3, 2024
IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave SignalsYadong Li, Dongheng Zhang, Ruixu Geng et al.
Recent advancements have showcased the potential of handheld millimeter-wave (mmWave) imaging, which applies synthetic aperture radar (SAR) principles in portable settings. However, existing studies addressing handheld motion errors either rely on costly tracking devices or employ simplified imaging models, leading to impractical deployment or limited performance. In this paper, we present IFNet, a novel deep unfolding network that combines the strengths of signal processing models and deep neural networks to achieve robust imaging and focusing for handheld mmWave systems. We first formulate the handheld imaging model by integrating multiple priors about mmWave images and handheld phase errors. Furthermore, we transform the optimization processes into an iterative network structure for improved and efficient imaging performance. Extensive experiments demonstrate that IFNet effectively compensates for handheld phase errors and recovers high-fidelity images from severely distorted signals. In comparison with existing methods, IFNet can achieve at least 11.89 dB improvement in average peak signal-to-noise ratio (PSNR) and 64.91% improvement in average structural similarity index measure (SSIM) on a real-world dataset.
HCApr 24, 2025
Lessons from Deploying Learning-based CSI Localization on a Large-Scale ISAC PlatformTianyu Zhang, Dongheng Zhang, Ruixu Geng et al.
In recent years, Channel State Information (CSI), recognized for its fine-grained spatial characteristics, has attracted increasing attention in WiFi-based indoor localization. However, despite its potential, CSI-based approaches have yet to achieve the same level of deployment scale and commercialization as those based on Received Signal Strength Indicator (RSSI). A key limitation lies in the fact that most existing CSI-based systems are developed and evaluated in controlled, small-scale environments, limiting their generalizability. To bridge this gap, we explore the deployment of a large-scale CSI-based localization system involving over 400 Access Points (APs) in a real-world building under the Integrated Sensing and Communication (ISAC) paradigm. We highlight two critical yet often overlooked factors: the underutilization of unlabeled data and the inherent heterogeneity of CSI measurements. To address these challenges, we propose a novel CSI-based learning framework for WiFi localization, tailored for large-scale ISAC deployments on the server side. Specifically, we employ a novel graph-based structure to model heterogeneous CSI data and reduce redundancy. We further design a pretext pretraining task that incorporates spatial and temporal priors to effectively leverage large-scale unlabeled CSI data. Complementarily, we introduce a confidence-aware fine-tuning strategy to enhance the robustness of localization results. In a leave-one-smartphone-out experiment spanning five floors and 25, 600 m2, we achieve a median localization error of 2.17 meters and a floor accuracy of 99.49%. This performance corresponds to an 18.7% reduction in mean absolute error (MAE) compared to the best-performing baseline.
CVJan 25, 2022
RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio SignalsZhi Wu, Dongheng Zhang, Chunyang Xie et al.
Human silhouette segmentation, which is originally defined in computer vision, has achieved promising results for understanding human activities. However, the physical limitation makes existing systems based on optical cameras suffer from severe performance degradation under low illumination, smoke, and/or opaque obstruction conditions. To overcome such limitations, in this paper, we propose to utilize the radio signals, which can traverse obstacles and are unaffected by the lighting conditions to achieve silhouette segmentation. The proposed RFMask framework is composed of three modules. It first transforms RF signals captured by millimeter wave radar on two planes into spatial domain and suppress interference with the signal processing module. Then, it locates human reflections on RF frames and extract features from surrounding signals with human detection module. Finally, the extracted features from RF frames are aggregated with an attention based mask generation module. To verify our proposed framework, we collect a dataset containing 804,760 radio frames and 402,380 camera frames with human activities under various scenes. Experimental results show that the proposed framework can achieve impressive human silhouette segmentation even under the challenging scenarios(such as low light and occlusion scenarios) where traditional optical-camera-based methods fail. To the best of our knowledge, this is the first investigation towards segmenting human silhouette based on millimeter wave signals. We hope that our work can serve as a baseline and inspire further research that perform vision tasks with radio signals. The dataset and codes will be made in public.
CVDec 16, 2021
Radio-Assisted Human DetectionChengrun Qiu, Dongheng Zhang, Yang Hu et al.
In this paper, we propose a radio-assisted human detection framework by incorporating radio information into the state-of-the-art detection methods, including anchor-based onestage detectors and two-stage detectors. We extract the radio localization and identifer information from the radio signals to assist the human detection, due to which the problem of false positives and false negatives can be greatly alleviated. For both detectors, we use the confidence score revision based on the radio localization to improve the detection performance. For two-stage detection methods, we propose to utilize the region proposals generated from radio localization rather than relying on region proposal network (RPN). Moreover, with the radio identifier information, a non-max suppression method with the radio localization constraint has also been proposed to further suppress the false detections and reduce miss detections. Experiments on the simulative Microsoft COCO dataset and Caltech pedestrian datasets show that the mean average precision (mAP) and the miss rate of the state-of-the-art detection methods can be improved with the aid of radio information. Finally, we conduct experiments in real-world scenarios to demonstrate the feasibility of our proposed method in practice.
MMDec 7, 2021
RFGAN: RF-Based Human SynthesisCong Yu, Zhi Wu, Dongheng Zhang et al.
This paper demonstrates human synthesis based on the Radio Frequency (RF) signals, which leverages the fact that RF signals can record human movements with the signal reflections off the human body. Different from existing RF sensing works that can only perceive humans roughly, this paper aims to generate fine-grained optical human images by introducing a novel cross-modal RFGAN model. Specifically, we first build a radio system equipped with horizontal and vertical antenna arrays to transceive RF signals. Since the reflected RF signals are processed as obscure signal projection heatmaps on the horizontal and vertical planes, we design a RF-Extractor with RNN in RFGAN for RF heatmap encoding and combining to obtain the human activity information. Then we inject the information extracted by the RF-Extractor and RNN as the condition into GAN using the proposed RF-based adaptive normalizations. Finally, we train the whole model in an end-to-end manner. To evaluate our proposed model, we create two cross-modal datasets (RF-Walk & RF-Activity) that contain thousands of optical human activity frames and corresponding RF signals. Experimental results show that the RFGAN can generate target human activity frames using RF signals. To the best of our knowledge, this is the first work to generate optical images based on RF signals.
CVNov 20, 2021
Unsupervised Domain Adaptation for RF-based Gesture RecognitionBin-Bin Zhang, Dongheng Zhang, Yadong Li et al.
Human gesture recognition with Radio Frequency (RF) signals has attained acclaim due to the omnipresence, privacy protection, and broad coverage nature of RF signals. These gesture recognition systems rely on neural networks trained with a large number of labeled data. However, the recognition model trained with data under certain conditions would suffer from significant performance degradation when applied in practical deployment, which limits the application of gesture recognition systems. In this paper, we propose an unsupervised domain adaptation framework for RF-based gesture recognition aiming to enhance the performance of the recognition model in new conditions by making effective use of the unlabeled data from new conditions. We first propose pseudo-labeling and consistency regularization to utilize unlabeled data for model training and eliminate the feature discrepancies in different domains. Then we propose a confidence constraint loss to enhance the effectiveness of pseudo-labeling, and design two corresponding data augmentation methods based on the characteristic of the RF signals to strengthen the performance of the consistency regularization, which can make the framework more effective and robust. Furthermore, we propose a cross-match loss to integrate the pseudo-labeling and consistency regularization, which makes the whole framework simple yet effective. Extensive experiments demonstrate that the proposed framework could achieve 4.35% and 2.25% accuracy improvement comparing with the state-of-the-art methods on public WiFi dataset and millimeter wave (mmWave) radar dataset, respectively.
CVNov 11, 2021
Towards Domain-Independent and Real-Time Gesture Recognition Using mmWave SignalYadong Li, Dongheng Zhang, Jinbo Chen et al.
Human gesture recognition using millimeter-wave (mmWave) signals provides attractive applications including smart home and in-car interfaces. While existing works achieve promising performance under controlled settings, practical applications are still limited due to the need of intensive data collection, extra training efforts when adapting to new domains, and poor performance for real-time recognition. In this paper, we propose DI-Gesture, a domain-independent and real-time mmWave gesture recognition system. Specifically, we first derive signal variations corresponding to human gestures with spatial-temporal processing. To enhance the robustness of the system and reduce data collecting efforts, we design a data augmentation framework for mmWave signals based on correlations between signal patterns and gesture variations. Furthermore, a spatial-temporal gesture segmentation algorithm is employed for real-time recognition. Extensive experimental results show DI-Gesture achieves an average accuracy of 97.92\%, 99.18\%, and 98.76\% for new users, environments, and locations, respectively. We also evaluate DI-Gesture in challenging scenarios like real-time recognition and sensing at extreme angles, all of which demonstrate the superior robustness and effectiveness of our system.