CVFeb 3Code
High-Resolution Underwater Camouflaged Object Detection: GBU-UCOD Dataset and Topology-Aware and Frequency-Decoupled NetworksWenji Wu, Shuo Ye, Yiyu Liu et al.
Underwater Camouflaged Object Detection (UCOD) is a challenging task due to the extreme visual similarity between targets and backgrounds across varying marine depths. Existing methods often struggle with topological fragmentation of slender creatures in the deep sea and the subtle feature extraction of transparent organisms. In this paper, we propose DeepTopo-Net, a novel framework that integrates topology-aware modeling with frequency-decoupled perception. To address physical degradation, we design the Water-Conditioned Adaptive Perceptor (WCAP), which employs Riemannian metric tensors to dynamically deform convolutional sampling fields. Furthermore, the Abyssal-Topology Refinement Module (ATRM) is developed to maintain the structural connectivity of spindly targets through skeletal priors. Specifically, we first introduce GBU-UCOD, the first high-resolution (2K) benchmark tailored for marine vertical zonation, filling the data gap for hadal and abyssal zones. Extensive experiments on MAS3K, RMAS, and our proposed GBU-UCOD datasets demonstrate that DeepTopo-Net achieves state-of-the-art performance, particularly in preserving the morphological integrity of complex underwater patterns. The datasets and codes will be released at https://github.com/Wuwenji18/GBU-UCOD.
CVFeb 19, 2025Code
CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological MeasurementZheng Wu, Yiping Xie, Bo Zhao et al.
Heart rate (HR) estimation via remote photoplethysmography (rPPG) offers a non-invasive solution for health monitoring. However, traditional single-modality approaches (RGB or Radio Frequency (RF)) face challenges in balancing robustness and accuracy due to lighting variations, motion artifacts, and skin tone bias. In this paper, we propose CardiacMamba, a multimodal RGB-RF fusion framework that leverages the complementary strengths of both modalities. It introduces the Temporal Difference Mamba Module (TDMM) to capture dynamic changes in RF signals using timing differences between frames, enhancing the extraction of local and global features. Additionally, CardiacMamba employs a Bidirectional SSM for cross-modal alignment and a Channel-wise Fast Fourier Transform (CFFT) to effectively capture and refine the frequency domain characteristics of RGB and RF signals, ultimately improving heart rate estimation accuracy and periodicity detection. Extensive experiments on the EquiPleth dataset demonstrate state-of-the-art performance, achieving marked improvements in accuracy and robustness. CardiacMamba significantly mitigates skin tone bias, reducing performance disparities across demographic groups, and maintains resilience under missing-modality scenarios. By addressing critical challenges in fairness, adaptability, and precision, the framework advances rPPG technology toward reliable real-world deployment in healthcare. The codes are available at: https://github.com/WuZheng42/CardiacMamba.
SPNov 14, 2023
Fairness-Driven Optimization of RIS-Augmented 5G Networks for Seamless 3D UAV Connectivity Using DRL AlgorithmsYu Tian, Ahmed Alhammadi, Jiguang He et al.
In this paper, we study the problem of joint active and passive beamforming for reconfigurable intelligent surface (RIS)-assisted massive multiple-input multiple-output systems towards the extension of the wireless cellular coverage in 3D, where multiple RISs, each equipped with an array of passive elements, are deployed to assist a base station (BS) to simultaneously serve multiple unmanned aerial vehicles (UAVs) in the same time-frequency resource of 5G wireless communications. With a focus on ensuring fairness among UAVs, our objective is to maximize the minimum signal-to-interference-plus-noise ratio (SINR) at UAVs by jointly optimizing the transmit beamforming parameters at the BS and phase shift parameters at RISs. We propose two novel algorithms to address this problem. The first algorithm aims to mitigate interference by calculating the BS beamforming matrix through matrix inverse operations once the phase shift parameters are determined. The second one is based on the principle that one RIS element only serves one UAV and the phase shift parameter of this RIS element is optimally designed to compensate the phase offset caused by the propagation and fading. To obtain the optimal parameters, we utilize one state-of-the-art reinforcement learning algorithm, deep deterministic policy gradient, to solve these two optimization problems. Simulation results are provided to illustrate the effectiveness of our proposed solution and some insightful remarks are observed.
LGMay 3, 2022
Revisiting Communication-Efficient Federated Learning with Balanced Global and Local UpdatesZhigang Yan, Dong Li, Zhichao Zhang et al.
In federated learning (FL), a number of devices train their local models and upload the corresponding parameters or gradients to the base station (BS) to update the global model while protecting their data privacy. However, due to the limited computation and communication resources, the number of local trainings (a.k.a. local update) and that of aggregations (a.k.a. global update) need to be carefully chosen. In this paper, we investigate and analyze the optimal trade-off between the number of local trainings and that of global aggregations to speed up the convergence and enhance the prediction accuracy over the existing works. Our goal is to minimize the global loss function under both the delay and the energy consumption constraints. In order to make the optimization problem tractable, we derive a new and tight upper bound on the loss function, which allows us to obtain closed-form expressions for the number of local trainings and that of global aggregations. Simulation results show that our proposed scheme can achieve a better performance in terms of the prediction accuracy, and converge much faster than the baseline schemes.
AIDec 9, 2025
Performance Comparison of Aerial RIS and STAR-RIS in 3D Wireless EnvironmentsDongdong Yang, Bin Li, Jiguang He
Reconfigurable intelligent surface (RIS) and simultaneously transmitting and reflecting RIS (STAR-RIS) have emerged as key enablers for enhancing wireless coverage and capacity in next-generation networks. When mounted on unmanned aerial vehicles (UAVs), they benefit from flexible deployment and improved line-of-sight conditions. Despite their promising potential, a comprehensive performance comparison between aerial RIS and STAR-RIS architectures has not been thoroughly investigated. This letter presents a detailed performance comparison between aerial RIS and STAR-RIS in three-dimensional wireless environments. Accurate channel models incorporating directional radiation patterns are established, and the influence of deployment altitude and orientation is thoroughly examined. To optimize the system sum-rate, we formulate joint optimization problems for both architectures and propose an efficient solution based on the weighted minimum mean square error and block coordinate descent algorithms. Simulation results reveal that STAR-RIS outperforms RIS in low-altitude scenarios due to its full-space coverage capability, whereas RIS delivers better performance near the base station at higher altitudes. The findings provide practical insights for the deployment of aerial intelligent surfaces in future 6G communication systems.
89.5ITApr 29
Multi-Server Secure Aggregation with Arbitrary Collusion and Heterogeneous Security ConstraintsZhou Li, Xiang Zhang, Jiguang He et al.
We study the fundamental limits of multi-server secure aggregation over a two-hop network where multiple servers, each connected to a disjoint subset of users, jointly compute the sum of all users' inputs. The goal is to ensure that no server can infer any information about prescribed subsets of inputs beyond the desired aggregate, even when colluding with an arbitrary subset of users. Existing works largely focus on homogeneous security requirements, where all inputs are protected against colluding sets up to a given size. Such formulations are insufficient to capture more general scenarios in which different subsets of inputs may require protection against different collusion patterns. In this paper, we consider a general model with heterogeneous security requirements and arbitrary user collusion. We characterize the communication rates for all parameter regimes, and determine the minimum key rate required for secure aggregation in most regimes. In particular, we establish tight information-theoretic lower bounds and matching achievable schemes in a broad class of regimes. For the remaining regime, we derive a general lower bound together with an achievable scheme that attains it within a bounded gap. Our results reveal how the interplay between network topology and heterogeneous security constraints fundamentally determines the communication and key generation requirements, and generalize existing results on secure aggregation.
CLJun 17, 2025
M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language ModelsCan Zheng, Jiguang He, Chung G. Kang et al.
This paper introduces a novel neural network framework called M2BeamLLM for beam prediction in millimeter-wave (mmWave) massive multi-input multi-output (mMIMO) communication systems. M2BeamLLM integrates multi-modal sensor data, including images, radar, LiDAR, and GPS, leveraging the powerful reasoning capabilities of large language models (LLMs) such as GPT-2 for beam prediction. By combining sensing data encoding, multimodal alignment and fusion, and supervised fine-tuning (SFT), M2BeamLLM achieves significantly higher beam prediction accuracy and robustness, demonstrably outperforming traditional deep learning (DL) models in both standard and few-shot scenarios. Furthermore, its prediction performance consistently improves with increased diversity in sensing modalities. Our study provides an efficient and intelligent beam prediction solution for vehicle-to-infrastructure (V2I) mmWave communication systems.
LGMar 13, 2025
BeamLLM: Vision-Empowered mmWave Beam Prediction with Large Language ModelsCan Zheng, Jiguang He, Guofa Cai et al.
In this paper, we propose BeamLLM, a vision-aided millimeter-wave (mmWave) beam prediction framework leveraging large language models (LLMs) to address the challenges of high training overhead and latency in mmWave communication systems. By combining computer vision (CV) with LLMs' cross-modal reasoning capabilities, the framework extracts user equipment (UE) positional features from RGB images and aligns visual-temporal features with LLMs' semantic space through reprogramming techniques. Evaluated on a realistic vehicle-to-infrastructure (V2I) scenario, the proposed method achieves 61.01% top-1 accuracy and 97.39% top-3 accuracy in standard prediction tasks, significantly outperforming traditional deep learning models. In few-shot prediction scenarios, the performance degradation is limited to 12.56% (top-1) and 5.55% (top-3) from time sample 1 to 10, demonstrating superior prediction capability.
LGNov 23, 2025
Generative Model-Aided Continual Learning for CSI Feedback in FDD mMIMO-OFDM SystemsGuijun Liu, Yuwen Cao, Tomoaki Ohtsuki et al.
Deep autoencoder (DAE) frameworks have demonstrated their effectiveness in reducing channel state information (CSI) feedback overhead in massive multiple-input multiple-output (mMIMO) orthogonal frequency division multiplexing (OFDM) systems. However, existing CSI feedback models struggle to adapt to dynamic environments caused by user mobility, requiring retraining when encountering new CSI distributions. Moreover, returning to previously encountered environments often leads to performance degradation due to catastrophic forgetting. Continual learning involves enabling models to incorporate new information while maintaining performance on previously learned tasks. To address these challenges, we propose a generative adversarial network (GAN)-based learning approach for CSI feedback. By using a GAN generator as a memory unit, our method preserves knowledge from past environments and ensures consistently high performance across diverse scenarios without forgetting. Simulation results show that the proposed approach enhances the generalization capability of the DAE framework while maintaining low memory overhead. Furthermore, it can be seamlessly integrated with other advanced CSI feedback models, highlighting its robustness and adaptability.
SPSep 16, 2021
Beyond 5G RIS mmWave Systems: Where Communication and Localization MeetJiguang He, Fan Jiang, Kamran Keykhosravi et al.
Upcoming beyond fifth generation (5G) communications systems aim at further enhancing key performance indicators and fully supporting brand new use cases by embracing emerging techniques, e.g., reconfigurable intelligent surface (RIS), integrated communication, localization, and sensing, and mmWave/THz communications. The wireless intelligence empowered by state-of-the-art artificial intelligence techniques has been widely considered at the transceivers, and now the paradigm is deemed to be shifted to the smart control of radio propagation environment by virtue of RISs. In this article, we argue that to harness the full potential of RISs, localization and communication must be tightly coupled. This is in sharp contrast to 5G and earlier generations, where localization was a minor additional service. To support this, we first introduce the fundamentals of RIS mmWave channel modeling, followed by RIS channel state information acquisition and link establishment. Then, we deal with the connection between localization and communications, from a separate and joint perspective.
SPJul 27, 2021
Learning to Estimate RIS-Aided mmWave ChannelsJiguang He, Henk Wymeersch, Marco Di Renzo et al.
Inspired by the remarkable learning and prediction performance of deep neural networks (DNNs), we apply one special type of DNN framework, known as model-driven deep unfolding neural network, to reconfigurable intelligent surface (RIS)-aided millimeter wave (mmWave) single-input multiple-output (SIMO) systems. We focus on uplink cascaded channel estimation, where known and fixed base station combining and RIS phase control matrices are considered for collecting observations. To boost the estimation performance and reduce the training overhead, the inherent channel sparsity of mmWave channels is leveraged in the deep unfolding method. It is verified that the proposed deep unfolding network architecture can outperform the least squares (LS) method with a relatively smaller training overhead and online computational complexity.
SPApr 14, 2021
Channel Estimation and Hybrid Architectures for RIS-Assisted CommunicationsJiguang He, Nhan Thanh Nguyen, Rafaela Schroeder et al.
Reconfigurable intelligent surfaces (RISs) are considered as potential technologies for the upcoming sixth-generation (6G) wireless communication system. Various benefits brought by deploying one or multiple RISs include increased spectrum and energy efficiency, enhanced connectivity, extended communication coverage, reduced complexity at transceivers, and even improved localization accuracy. However, to unleash their full potential, fundamentals related to RISs, ranging from physical-layer (PHY) modelling to RIS phase control, need to be addressed thoroughly. In this paper, we provide an overview of some timely research problems related to the RIS technology, i.e., PHY modelling (including also physics), channel estimation, potential RIS architectures, and RIS phase control (via both model-based and data-driven approaches), along with recent numerical results. We envision that more efforts will be devoted towards intelligent wireless environments, enabled by RISs.