Hei Victor Cheng

SP
h-index75
10papers
146citations
Novelty56%
AI Score52

10 Papers

ITDec 8, 2016
Optimal Pilot and Payload Power Control in Single-Cell Massive MIMO Systems

Hei Victor Cheng, Emil Björnson, Erik G. Larsson

This paper considers the jointly optimal pilot and data power allocation in single-cell uplink massive multiple-input-multiple-output (MIMO) systems. Using the spectral efficiency (SE) as performance metric and setting a total energy budget per coherence interval, the power control is formulated as optimization problems for two different objective functions: the weighted minimum SE among the users and the weighted sum SE. A closed form solution for the optimal length of the pilot sequence is derived. The optimal power control policy for the former problem is found by solving a simple equation with a single variable. Utilizing the special structure arising from imperfect channel estimation, a convex reformulation is found to solve the latter problem to global optimality in polynomial time. The gain of the optimal joint power control is theoretically justified, and is proved to be large in the low SNR regime. Simulation results also show the advantage of optimizing the power control over both pilot and data power, as compared to the cases of using full power and of only optimizing the data powers as done in previous work.

ITSep 9, 2015
Uplink Pilot and Data Power Control for Single Cell Massive MIMO Systems with MRC

Hei Victor Cheng, Emil Björnson, Erik G. Larsson

This paper considers the jointly optimal pilot and data power allocation in single cell uplink massive MIMO systems. A closed form solution for the optimal length of the training interval is derived. Using the spectral efficiency (SE) as performance metric and setting a total energy budget per co- herence interval the power control is formulated as optimization problems for two different objective functions: the minimum SE among the users and the sum SE. The optimal power control policy is found for the case of maximizing the minimum SE by converting it to a geometric program (GP). Since maximizing the sum SE is an NP-hard problem, an efficient algorithm is developed for finding KKT (local maximum) points. Simulation results show the advantage of optimizing the power control over both pilot and data power, as compared to heuristic power control policies.

SPJun 1
RA-LWLM: Retrieval-Augmented In-Context Localization with Wireless Foundation Models

Guangjin Pan, Hui Chen, Hei Victor Cheng et al.

Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.

LGDec 15, 2025
Link-Aware Energy-Frugal Continual Learning for Fault Detection in IoT Networks

Henrik C. M. Frederiksen, Junya Shiraishi, Cedomir Stefanovic et al.

The use of lightweight machine learning (ML) models in internet of things (IoT) networks enables resource constrained IoT devices to perform on-device inference for several critical applications. However, the inference accuracy deteriorates due to the non-stationarity in the IoT environment and limited initial training data. To counteract this, the deployed models can be updated occasionally with new observed data samples. However, this approach consumes additional energy, which is undesirable for energy constrained IoT devices. This letter introduces an event-driven communication framework that strategically integrates continual learning (CL) in IoT networks for energy-efficient fault detection. Our framework enables the IoT device and the edge server (ES) to collaboratively update the lightweight ML model by adapting to the wireless link conditions for communication and the available energy budget. Evaluation on real-world datasets show that the proposed approach can outperform both periodic sampling and non-adaptive CL in terms of inference recall; our proposed approach achieves up to a 42.8% improvement, even under tight energy and bandwidth constraint.

CVOct 5, 2023
Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior

Jinting Wang, Li Liu, Jun Wang et al.

Speech-to-face generation is an intriguing area of research that focuses on generating realistic facial images based on a speaker's audio speech. However, state-of-the-art methods employing GAN-based architectures lack stability and cannot generate realistic face images. To fill this gap, we propose a novel speech-to-face generation framework, which leverages a Speech-Conditioned Latent Diffusion Model, called SCLDM. To the best of our knowledge, this is the first work to harness the exceptional modeling capabilities of diffusion models for speech-to-face generation. Preserving the shared identity information between speech and face is crucial in generating realistic results. Therefore, we employ contrastive pre-training for both the speech encoder and the face encoder. This pre-training strategy facilitates effective alignment between the attributes of speech, such as age and gender, and the corresponding facial characteristics in the face images. Furthermore, we tackle the challenge posed by excessive diversity in the synthesis process caused by the diffusion model. To overcome this challenge, we introduce the concept of residuals by integrating a statistical face prior to the diffusion process. This addition helps to eliminate the shared component across the faces and enhances the subtle variations captured by the speech condition. Extensive quantitative, qualitative, and user study experiments demonstrate that our method can produce more realistic face images while preserving the identity of the speaker better than state-of-the-art methods. Highlighting the notable enhancements, our method demonstrates significant gains in all metrics on the AVSpeech dataset and Voxceleb dataset, particularly noteworthy are the improvements of 32.17 and 32.72 on the cosine distance metric for the two datasets, respectively.

AIOct 7, 2023
SWAP: Sparse Entropic Wasserstein Regression for Robust Network Pruning

Lei You, Hei Victor Cheng

This study addresses the challenge of inaccurate gradients in computing the empirical Fisher Information Matrix during neural network pruning. We introduce SWAP, a formulation of Entropic Wasserstein regression (EWR) for pruning, capitalizing on the geometric properties of the optimal transport problem. The ``swap'' of the commonly used linear regression with the EWR in optimization is analytically demonstrated to offer noise mitigation effects by incorporating neighborhood interpolation across data points with only marginal additional computational cost. The unique strength of SWAP is its intrinsic ability to balance noise reduction and covariance information preservation effectively. Extensive experiments performed on various networks and datasets show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.

SPApr 27
Beam Scheduling for Cross-Layer ISAC: A Deep Reinforcement Learning Approach

Xiyu Wang, Gilberto Berardinelli, Hei Victor Cheng et al.

Resource allocation in integrated sensing and communication (ISAC) systems needs to be optimized to balance the requirements of the communication and sensing modules considering complicated cross-layer data traffic and queue status in dynamic multi-user environments. This paper studies the beam allocation for cross-layer ISAC that achieves low-latency communication and minimizes sensing parameters estimation error. To handle the complex coupling between practical data buffer dynamics and varying wireless channels, we propose a deep reinforcement learning (DRL)-assisted approach. Rather than relying on explicit channel state information, the DRL-assisted beam allocation reduces feedback overhead by leveraging sensing observations. Simulation results verify that the DRL framework effectively takes buffer status into account and adapts to the wireless environment while allocating resources. The proposed multi-beam scheme improves overall throughput with only modest delay increases. Finally, the DRL-assisted beam management achieves both communication and sensing performance close to that of the genie-aided benchmark with perfect angle-of-departure (AoD) knowledge. These contributions advance the state-of-the-art intelligent resource management for ISAC systems.

ASOct 28, 2025
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement

Jinting Wang, Jun Wang, Hei Victor Cheng et al.

Unlike existing methods that rely on source images as appearance references and use source speech to generate motion, this work proposes a novel approach that directly extracts information from the speech, addressing key challenges in speech-to-talking face. Specifically, we first employ a speech-to-face portrait generation stage, utilizing a speech-conditioned diffusion model combined with statistical facial prior and a sample-adaptive weighting module to achieve high-quality portrait generation. In the subsequent speech-driven talking face generation stage, we embed expressive dynamics such as lip movement, facial expressions, and eye movements into the latent space of the diffusion model and further optimize lip synchronization using a region-enhancement module. To generate high-resolution outputs, we integrate a pre-trained Transformer-based discrete codebook with an image rendering network, enhancing video frame details in an end-to-end manner. Experimental results demonstrate that our method outperforms existing approaches on the HDTF, VoxCeleb, and AVSpeech datasets. Notably, this is the first method capable of generating high-resolution, high-quality talking face videos exclusively from a single speech input.

SPMay 9, 2025
Multi-User Beamforming with Deep Reinforcement Learning in Sensing-Aided Communication

Xiyu Wang, Gilberto Berardinelli, Hei Victor Cheng et al.

Mobile users are prone to experience beam failure due to beam drifting in millimeter wave (mmWave) communications. Sensing can help alleviate beam drifting with timely beam changes and low overhead since it does not need user feedback. This work studies the problem of optimizing sensing-aided communication by dynamically managing beams allocated to mobile users. A multi-beam scheme is introduced, which allocates multiple beams to the users that need an update on the angle of departure (AoD) estimates and a single beam to the users that have satisfied AoD estimation precision. A deep reinforcement learning (DRL) assisted method is developed to optimize the beam allocation policy, relying only upon the sensing echoes. For comparison, a heuristic AoD-based method using approximated Cramér-Rao lower bound (CRLB) for allocation is also presented. Both methods require neither user feedback nor prior state evolution information. Results show that the DRL-assisted method achieves a considerable gain in throughput than the conventional beam sweeping method and the AoD-based method, and it is robust to different user speeds.

LGNov 22, 2020
Learning Class Unique Features in Fine-Grained Visual Classification

Runkai Zheng, Zhijia Yu, Yinqi Zhang et al.

A major challenge in Fine-Grained Visual Classification (FGVC) is distinguishing various categories with high inter-class similarity by learning the feature that differentiate the details. Conventional cross entropy trained Convolutional Neural Network (CNN) fails this challenge as it may suffer from producing inter-class invariant features in FGVC. In this work, we innovatively propose to regularize the training of CNN by enforcing the uniqueness of the features to each category from an information theoretic perspective. To achieve this goal, we formulate a minimax loss based on a game theoretic framework, where a Nash equilibria is proved to be consistent with this regularization objective. Besides, to prevent from a feasible solution of minimax loss that may produce redundant features, we present a Feature Redundancy Loss (FRL) based on normalized inner product between each selected feature map pair to complement the proposed minimax loss. Superior experimental results on several influential benchmarks along with visualization show that our method gives full play to the performance of the baseline model without additional computation and achieves comparable results with state-of-the-art models.