Xiaojun Yuan

CV
h-index37
25papers
437citations
Novelty54%
AI Score57

25 Papers

CVFeb 18, 2023
Closed-Loop Transcription via Convolutional Sparse Coding

Xili Dai, Ke Chen, Shengbang Tong et al.

Autoencoding has achieved great empirical success as a framework for learning generative models for natural images. Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret, and the learned representations lack clear structure. In this work, we make the explicit assumption that the image distribution is generated from a multi-stage sparse deconvolution. The corresponding inverse map, which we use as an encoder, is a multi-stage convolution sparse coding (CSC), with each stage obtained from unrolling an optimization algorithm for solving the corresponding (convexified) sparse coding program. To avoid computational difficulties in minimizing distributional distance between the real and generated images, we utilize the recent closed-loop transcription (CTRL) framework that optimizes the rate reduction of the learned sparse representations. Conceptually, our method has high-level connections to score-matching methods such as diffusion models. Empirically, our framework demonstrates competitive performance on large-scale datasets, such as ImageNet-1K, compared to existing autoencoding and generative methods under fair conditions. Even with simpler networks and fewer computational resources, our method demonstrates high visual quality in regenerated images. More surprisingly, the learned autoencoder performs well on unseen datasets. Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets. Our method is arguably the first to demonstrate that a concatenation of multiple convolution sparse coding/decoding layers leads to an interpretable and effective autoencoder for modeling the distribution of large-scale natural image datasets.

LGMay 30, 2022
Confederated Learning: Federated Learning with Decentralized Edge Servers

Bin Wang, Jun Fang, Hongbin Li et al.

Federated learning (FL) is an emerging machine learning paradigm that allows to accomplish model training without aggregating data at a central server. Most studies on FL consider a centralized framework, in which a single server is endowed with a central authority to coordinate a number of devices to perform model training in an iterative manner. Due to stringent communication and bandwidth constraints, such a centralized framework has limited scalability as the number of devices grows. To address this issue, in this paper, we propose a ConFederated Learning (CFL) framework. The proposed CFL consists of multiple servers, in which each server is connected with an individual set of devices as in the conventional FL framework, and decentralized collaboration is leveraged among servers to make full use of the data dispersed throughout the network. We develop an alternating direction method of multipliers (ADMM) algorithm for CFL. The proposed algorithm employs a random scheduling policy which randomly selects a subset of devices to access their respective servers at each iteration, thus alleviating the need of uploading a huge amount of information from devices to servers. Theoretical analysis is presented to justify the proposed method. Numerical results show that the proposed method can converge to a decent solution significantly faster than gradient-based FL algorithms, thus boasting a substantial advantage in terms of communication efficiency.

CVAug 10, 2023
PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs

Wentao Hu, Jia Zheng, Zixin Zhang et al.

In this paper, we develop a new method to automatically convert 2D line drawings from three orthographic views into 3D CAD models. Existing methods for this problem reconstruct 3D models by back-projecting the 2D observations into 3D space while maintaining explicit correspondence between the input and output. Such methods are sensitive to errors and noises in the input, thus often fail in practice where the input drawings created by human designers are imperfect. To overcome this difficulty, we leverage the attention mechanism in a Transformer-based sequence generation model to learn flexible mappings between the input and output. Further, we design shape programs which are suitable for generating the objects of interest to boost the reconstruction accuracy and facilitate CAD modeling applications. Experiments on a new benchmark dataset show that our method significantly outperforms existing ones when the inputs are noisy or incomplete.

ITMay 8, 2022
Over-the-Air Federated Multi-Task Learning via Model Sparsification and Turbo Compressed Sensing

Haoming Ma, Xiaojun Yuan, Zhi Ding et al.

To achieve communication-efficient federated multitask learning (FMTL), we propose an over-the-air FMTL (OAFMTL) framework, where multiple learning tasks deployed on edge devices share a non-orthogonal fading channel under the coordination of an edge server (ES). In OA-FMTL, the local updates of edge devices are sparsified, compressed, and then sent over the uplink channel in a superimposed fashion. The ES employs over-the-air computation in the presence of intertask interference. More specifically, the model aggregations of all the tasks are reconstructed from the channel observations concurrently, based on a modified version of the turbo compressed sensing (Turbo-CS) algorithm (named as M-Turbo-CS). We analyze the performance of the proposed OA-FMTL framework together with the M-Turbo-CS algorithm. Furthermore, based on the analysis, we formulate a communication-learning optimization problem to improve the system performance by adjusting the power allocation among the tasks at the edge devices. Numerical simulations show that our proposed OAFMTL effectively suppresses the inter-task interference, and achieves a learning performance comparable to its counterpart with orthogonal multi-task transmission. It is also shown that the proposed inter-task power allocation optimization algorithm substantially reduces the overall communication overhead by appropriately adjusting the power allocation among the tasks.

74.5ITApr 14
A Heterogeneous Dual-Network Framework for Emergency Delivery UAVs: Communication Assurance and Path Planning Coordination

Ping Huang, Bin Duo, Ziedor Godfred et al.

Natural disasters often damage ground infrastructure, making unmanned aerial vehicles (UAVs) essential for emergency supply delivery. Yet safe operation in complex post-disaster environments requires reliable command-and-control (C2) links; link instability can cause loss of control, delay rescue, and trigger severe secondary harm. To provide continuous three-dimensional (3D) C2 coverage during dynamic missions, we propose a Heterogeneous Dual-Network Framework (HDNF) for safe and reliable emergency delivery. HDNF tightly couples an Emergency Communication Support Network (ECSN), formed by hovering UAV base stations, with a Delivery Path Network (DPN), formed by fast-moving delivery UAVs. The ECSN dynamically safeguards mission-critical flight corridors, while the DPN aligns trajectories with reliable coverage regions. We formulate a joint optimization problem over task assignment, 3D UAV-BS deployment, and DPN path planning to maximize end-to-end C2 reliability while minimizing UAV flight energy consumption and base-station deployment cost. To solve this computationally intractable NP-hard problem, we develop a layered strategy with three components: (i) a multi-layer C2 service model that overcomes 2D-metric limitations and aligns UAV-BS deployment with mission-critical 3D phases; (ii) a 3D coverage-aware multi-agent reinforcement learning algorithm that addresses the high-dimensional search space and improves both training efficiency and topology resilience; and (iii) a 3D communication-aware A* planner that jointly optimizes C2 quality and flight energy, mitigating trajectory--coverage mismatch and improving routing safety. Extensive simulations show that HDNF markedly improves C2 reliability, eliminates outages in critical phases, and sustains high task success rates while reducing hardware deployment cost.

SPFeb 22
Event-Triggered Gossip for Distributed Learning

Zhiyuan Zhai, Xiaojun Yuan, Wei Ni et al.

While distributed learning offers a new learning paradigm for distributed network with no central coordination, it is constrained by communication bottleneck between nodes. We develop a new event-triggered gossip framework for distributed learning to reduce inter-node communication overhead. The framework introduces an adaptive communication control mechanism that enables each node to autonomously decide in a fully decentralized fashion when to exchange model information with its neighbors based on local model deviations. We analyze the ergodic convergence of the proposed framework under noconvex objectives and interpret the convergence guarantees under different triggering conditions. Simulation results show that the proposed framework achieves substantially lower communication overhead than the state-of-the-art distributed learning methods, reducing cumulative point-to-point transmissions by \textbf{71.61\%} with only a marginal performance loss, compared with the conventional full-communication baseline.

CVDec 16, 2025
Score-Based Turbo Message Passing for Plug-and-Play Compressive Imaging

Chang Cai, Hao Jiang, Xiaojun Yuan et al.

Message-passing algorithms have been adapted for compressive imaging by incorporating various off-the-shelf image denoisers. However, these denoisers rely largely on generic or hand-crafted priors and often fall short in accurately capturing the complex statistical structure of natural images. As a result, traditional plug-and-play (PnP) methods often lead to suboptimal reconstruction, especially in highly underdetermined regimes. Recently, score-based generative models have emerged as a powerful framework for accurately characterizing sophisticated image distribution. Yet, their direct use for posterior sampling typically incurs prohibitive computational complexity. In this paper, by exploiting the close connection between score-based generative modeling and empirical Bayes denoising, we devise a message-passing framework that integrates a score-based minimum mean-squared error (MMSE) denoiser for compressive image recovery. The resulting algorithm, named score-based turbo message passing (STMP), combines the fast convergence of message passing with the expressive power of score-based generative priors. For practical systems with quantized measurements, we further propose quantized STMP (Q-STMP), which augments STMP with a component-wise MMSE dequantization module. We demonstrate that the asymptotic performance of STMP and Q-STMP can be accurately predicted by a set of state-evolution (SE) equations. Experiments on the FFHQ dataset demonstrate that STMP strikes a significantly better performance-complexity tradeoff compared with competing baselines, and that Q-STMP remains robust even under 1-bit quantization. Remarkably, both STMP and Q-STMP typically converge within 10 iterations.

CVNov 12, 2021Code
Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction

Xili Dai, Shengbang Tong, Mingyang Li et al.

This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a closed-loop transcription between a multi-class multi-dimensional data distribution and a linear discriminative representation (LDR) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as the equilibrium point of a two-player minimax game between the encoder and decoder. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and often better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so learned features of the multiple classes are structured: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space. Source code can be found at https://github.com/Delay-Xili/LDR.

CVApr 22, 2021Code
Fully Convolutional Line Parsing

Xili Dai, Haigang Gong, Shuai Wu et al.

We present a one-stage Fully Convolutional Line Parsing network (F-Clip) that detects line segments from images. The proposed network is very simple and flexible with variations that gracefully trade off between speed and accuracy for different applications. F-Clip detects line segments in an end-to-end fashion by predicting each line's center position, length, and angle. We further customize the design of convolution kernels of our fully convolutional network to effectively exploit the statistical priors of the distribution of line angles in real image datasets. We conduct extensive experiments and show that our method achieves a significantly better trade-off between efficiency and accuracy, resulting in a real-time line detector at up to 73 FPS on a single GPU. Such inference speed makes our method readily applicable to real-time tasks without compromising any accuracy of previous methods. Moreover, when equipped with a performance-improving backbone network, F-Clip is able to significantly outperform all state-of-the-art line detectors on accuracy at a similar or even higher frame rate. In other word, under same inference speed, F-Clip always achieving best accuracy compare with other methods. Source code https://github.com/Delay-Xili/F-Clip.

LGMar 4
FedCova: Robust Federated Covariance Learning Against Noisy Labels

Xiangyu Zhong, Xiaojun Yuan, Ying-Jun Angela Zhang

Noisy labels in distributed datasets induce severe local overfitting and consequently compromise the global model in federated learning (FL). Most existing solutions rely on selecting clean devices or aligning with public clean datasets, rather than endowing the model itself with robustness. In this paper, we propose FedCova, a dependency-free federated covariance learning framework that eliminates such external reliances by enhancing the model's intrinsic robustness via a new perspective on feature covariances. Specifically, FedCova encodes data into a discriminative but resilient feature space to tolerate label noise. Built on mutual information maximization, we design a novel objective for federated lossy feature encoding that relies solely on class feature covariances with an error tolerance term. Leveraging feature subspaces characterized by covariances, we construct a subspace-augmented federated classifier. FedCova unifies three key processes through the covariance: (1) training the network for feature encoding, (2) constructing a classifier directly from the learned features, and (3) correcting noisy labels based on feature subspaces. We implement FedCova across both symmetric and asymmetric noisy settings under heterogeneous data distribution. Experimental results on CIFAR-10/100 and real-world noisy dataset Clothing1M demonstrate the superior robustness of FedCova compared with the state-of-the-art methods.

CVOct 24, 2024
BIFRÖST: 3D-Aware Image compositing with Language Instructions

Lingxiao Li, Kaixiong Gong, Weihong Li et al.

This paper introduces Bifröst, a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. Previous methods concentrate on image compositing at the 2D level, which fall short in handling complex spatial relationships ($\textit{e.g.}$, occlusion). Bifröst addresses these issues by training MLLM as a 2.5D location predictor and integrating depth maps as an extra condition during the generation process to bridge the gap between 2D and 3D, which enhances spatial comprehension and supports sophisticated spatial interactions. Our method begins by fine-tuning MLLM with a custom counterfactual dataset to predict 2.5D object locations in complex backgrounds from language instructions. Then, the image-compositing model is uniquely designed to process multiple types of input features, enabling it to perform high-fidelity image compositions that consider occlusion, depth blur, and image harmonization. Extensive qualitative and quantitative evaluations demonstrate that Bifröst significantly outperforms existing methods, providing a robust solution for generating realistically composited images in scenarios demanding intricate spatial understanding. This work not only pushes the boundaries of generative image compositing but also reduces reliance on expensive annotated datasets by effectively utilizing existing resources in innovative ways.

IVMar 28, 2025
Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery

Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

Message passing algorithms have been tailored for compressive imaging applications by plugging in different types of off-the-shelf image denoisers. These off-the-shelf denoisers mostly rely on some generic or hand-crafted priors for denoising. Due to their insufficient accuracy in capturing the true image prior, these methods often fail to produce satisfactory results, especially in largely underdetermined scenarios. On the other hand, score-based generative modeling offers a promising way to accurately characterize the sophisticated image distribution. In this paper, by exploiting the close relation between score-based modeling and empirical Bayes-optimal denoising, we devise a message passing framework that integrates a score-based minimum mean squared error (MMSE) denoiser for compressive image recovery. This framework is firmly rooted in Bayesian formalism, in which state evolution (SE) equations accurately predict its asymptotic performance. Experiments on the FFHQ dataset demonstrate that our method strikes a significantly better performance-complexity tradeoff than conventional message passing, regularized linear regression, and score-based posterior sampling baselines. Remarkably, our method typically requires less than 20 neural function evaluations (NFEs) to converge.

CVMar 26, 2025
Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model

Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang et al.

The distortion-perception (DP) tradeoff reveals a fundamental conflict between distortion metrics (e.g., MSE and PSNR) and perceptual quality. Recent research has increasingly concentrated on evaluating denoising algorithms within the DP framework. However, existing algorithms either prioritize perceptual quality by sacrificing acceptable distortion, or focus on minimizing MSE for faithful restoration. When the goal shifts or noisy measurements vary, adapting to different points on the DP plane needs retraining or even re-designing the model. Inspired by recent advances in solving inverse problems using score-based generative models, we explore the potential of flexibly and optimally traversing DP tradeoffs using a single pre-trained score-based model. Specifically, we introduce a variance-scaled reverse diffusion process and theoretically characterize the marginal distribution. We then prove that the proposed sample process is an optimal solution to the DP tradeoff for conditional Gaussian distribution. Experimental results on two-dimensional and image datasets illustrate that a single score network can effectively and flexibly traverse the DP tradeoff for general denoising problems.

CVAug 21, 2025
Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping

Dexuan He, Xiao Zhou, Wenbin Guan et al. · harvard

Rare cancers comprise 20-25% of all malignancies but face major diagnostic challenges due to limited expert availability-especially in pediatric oncology, where they represent over 70% of cases. While pathology vision-language (VL) foundation models show promising zero-shot capabilities for common cancer subtyping, their clinical performance for rare cancers remains limited. Existing multi-instance learning (MIL) methods rely only on visual features, overlooking cross-modal knowledge and compromising interpretability critical for rare cancer diagnosis. To address this limitation, we propose PathPT, a novel framework that fully exploits the potential of vision-language pathology foundation models through spatially-aware visual aggregation and task-specific prompt tuning. Unlike conventional MIL, PathPT converts WSI-level supervision into fine-grained tile-level guidance by leveraging the zero-shot capabilities of VL models, thereby preserving localization on cancerous regions and enabling cross-modal reasoning through prompts aligned with histopathological semantics. We benchmark PathPT on eight rare cancer datasets(four adult and four pediatric) spanning 56 subtypes and 2,910 WSIs, as well as three common cancer datasets, evaluating four state-of-the-art VL models and four MIL frameworks under three few-shot settings. Results show that PathPT consistently delivers superior performance, achieving substantial gains in subtyping accuracy and cancerous region grounding ability. This work advances AI-assisted diagnosis for rare cancers, offering a scalable solution for improving subtyping accuracy in settings with limited access to specialized expertise.

AIApr 21, 2025
AGI-Driven Generative Semantic Communications: Principles and Practices

Xiaojun Yuan, Haoming Ma, Yinuo Huang et al.

Semantic communications leverage artificial intelligence (AI) technologies to extract semantic information for efficient data delivery, thereby significantly reducing communication cost. With the evolution towards artificial general intelligence (AGI), the increasing demands for AGI services pose new challenges to semantic communications. In this context, an AGI application is typically defined on a general-sense task, covering a broad, even unforeseen, set of objectives, as well as driven by the need for a human-friendly interface in forms (e.g., videos, images, or text) easily understood by human users.In response, we introduce an AGI-driven communication paradigm for supporting AGI applications, called generative semantic communication (GSC). We first describe the basic concept of GSC and its difference from existing semantic communications, and then introduce a general framework of GSC based on advanced AI technologies including foundation models and generative models. Two case studies are presented to verify the advantages of GSC. Finally, open challenges and new research directions are discussed to stimulate this line of research and pave the way for practical applications.

SPDec 27, 2021
Over-the-Air Federated Multi-Task Learning Over MIMO Multiple Access Channels

Chenxi Zhong, Huiyuan Yang, Xiaojun Yuan

With the explosive growth of data and wireless devices, federated learning (FL) over wireless medium has emerged as a promising technology for large-scale distributed intelligent systems. Yet, the urgent demand for ubiquitous intelligence will generate a large number of concurrent FL tasks, which may seriously aggravate the scarcity of communication resources. By exploiting the analog superposition of electromagnetic waves, over-the-air computation (AirComp) is an appealing solution to alleviate the burden of communication required by FL. However, sharing frequency-time resources in over-the-air computation inevitably brings about the problem of inter-task interference, which poses a new challenge that needs to be appropriately addressed. In this paper, we study over-the-air federated multi-task learning (OA-FMTL) over the multiple-input multiple-output (MIMO) multiple access (MAC) channel. We propose a novel model aggregation method for the alignment of local gradients of different devices, which alleviates the straggler problem in over-the-air computation due to the channel heterogeneity. We establish a communication-learning analysis framework for the proposed OA-FMTL scheme by considering the spatial correlation between devices, and formulate an optimization problem for the design of transceiver beamforming and device selection. To solve this problem, we develop an algorithm by using alternating optimization (AO) and fractional programming (FP), which effectively mitigates the impact of inter-task interference on the FL learning performance. We show that due to the use of the new model aggregation method, device selection is no longer essential, thereby avoiding the heavy computational burden involved in selecting active devices. Numerical results demonstrate the validity of the analysis and the superb performance of the proposed scheme.

ITSep 6, 2021
Reconfigurable Intelligent Surface Empowered Over-the-Air Federated Edge Learning

Hang Liu, Zehong Lin, Xiaojun Yuan et al.

Federated edge learning (FEEL) has emerged as a revolutionary paradigm to develop AI services at the edge of 6G wireless networks as it supports collaborative model training at a massive number of mobile devices. However, model communication over wireless channels, especially in uplink model uploading of FEEL, has been widely recognized as a bottleneck that critically limits the efficiency of FEEL. Although over-the-air computation can alleviate the excessive cost of radio resources in FEEL model uploading, practical implementations of over-the-air FEEL still suffer from several challenges, including strong straggler issues, large communication overheads, and potential privacy leakage. In this article, we study these challenges in over-the-air FEEL and leverage reconfigurable intelligent surface (RIS), a key enabler of future wireless systems, to address these challenges. We study the state-of-the-art solutions on RIS-empowered FEEL and explore the promising research opportunities for adopting RIS to enhance FEEL performance.

LGJun 27, 2021
Over-the-Air Federated Multi-Task Learning

Haoming Ma, Xiaojun Yuan, Dian Fan et al.

In this letter, we introduce over-the-air computation into the communication design of federated multi-task learning (FMTL), and propose an over-the-air federated multi-task learning (OA-FMTL) framework, where multiple learning tasks deployed on edge devices share a non-orthogonal fading channel under the coordination of an edge server (ES). Specifically, the model updates for all the tasks are transmitted and superimposed concurrently over a non-orthogonal uplink fading channel, and the model aggregations of all the tasks are reconstructed at the ES through a modified version of the turbo compressed sensing algorithm (Turbo-CS) that overcomes inter-task interference. Both convergence analysis and numerical results show that the OA-FMTL framework can significantly improve the system efficiency in terms of reducing the number of channel uses without causing substantial learning performance degradation.

CVApr 16, 2021
Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

Cheng Yang, Jia Zheng, Xili Dai et al.

Single-image room layout reconstruction aims to reconstruct the enclosed 3D structure of a room from a single image. Most previous work relies on the cuboid-shape prior. This paper considers a more general indoor assumption, i.e., the room layout consists of a single ceiling, a single floor, and several vertical walls. To this end, we first employ Convolutional Neural Networks to detect planes and vertical lines between adjacent walls. Meanwhile, estimating the 3D parameters for each plane. Then, a simple yet effective geometric reasoning method is adopted to achieve room layout reconstruction. Furthermore, we optimize the 3D plane parameters to reconstruct a geometrically consistent room layout between planes and lines. The experimental results on public datasets validate the effectiveness and efficiency of our method.

ITMar 3, 2021
Temporal-Structure-Assisted Gradient Aggregation for Over-the-Air Federated Edge Learning

Dian Fan, Xiaojun Yuan, Ying-Jun Angela Zhang

In this paper, we investigate over-the-air model aggregation in a federated edge learning (FEEL) system. We introduce a Markovian probability model to characterize the intrinsic temporal structure of the model aggregation series. With this temporal probability model, we formulate the model aggregation problem as to infer the desired aggregated update given all the past observations from a Bayesian perspective. We develop a message passing based algorithm, termed temporal-structure-assisted gradient aggregation (TSA-GA), to fulfil this estimation task with low complexity and near-optimal performance. We further establish the state evolution (SE) analysis to characterize the behaviour of the proposed TSA-GA algorithm, and derive an explicit bound of the expected loss reduction of the FEEL system under certain standard regularity conditions. In addition, we develop an expectation maximization (EM) strategy to learn the unknown parameters in the Markovian model. We show that the proposed TSAGA algorithm significantly outperforms the state-of-the-art, and is able to achieve comparable learning performance as the error-free benchmark in terms of both convergence rate and final test accuracy.

ITFeb 22, 2021
CSIT-Free Model Aggregation for Federated Edge Learning via Reconfigurable Intelligent Surface

Hang Liu, Xiaojun Yuan, Ying-Jun Angela Zhang

We study over-the-air model aggregation in federated edge learning (FEEL) systems, where channel state information at the transmitters (CSIT) is assumed to be unavailable. We leverage the reconfigurable intelligent surface (RIS) technology to align the cascaded channel coefficients for CSIT-free model aggregation. To this end, we jointly optimize the RIS and the receiver by minimizing the aggregation error under the channel alignment constraint. We then develop a difference-of-convex algorithm for the resulting non-convex optimization. Numerical experiments on image classification show that the proposed method is able to achieve a similar learning accuracy as the state-of-the-art CSIT-based solution, demonstrating the efficiency of our approach in combating the lack of CSIT.

LGDec 10, 2020
Denoising-based Turbo Message Passing for Compressed Video Background Subtraction

Zhipeng Xue, Xiaojun Yuan, Yang Yang

In this paper, we consider the compressed video background subtraction problem that separates the background and foreground of a video from its compressed measurements. The background of a video usually lies in a low dimensional space and the foreground is usually sparse. More importantly, each video frame is a natural image that has textural patterns. By exploiting these properties, we develop a message passing algorithm termed offline denoising-based turbo message passing (DTMP). We show that these structural properties can be efficiently handled by the existing denoising techniques under the turbo message passing framework. We further extend the DTMP algorithm to the online scenario where the video data is collected in an online manner. The extension is based on the similarity/continuity between adjacent video frames. We adopt the optical flow method to refine the estimation of the foreground. We also adopt the sliding window based background estimation to reduce complexity. By exploiting the Gaussianity of messages, we develop the state evolution to characterize the per-iteration performance of offline and online DTMP. Comparing to the existing algorithms, DTMP can work at much lower compression rates, and can subtract the background successfully with a lower mean squared error and better visual quality for both offline and online compressed video background subtraction.

ITNov 20, 2020
Reconfigurable Intelligent Surface Enabled Federated Learning: A Unified Communication-Learning Design Approach

Hang Liu, Xiaojun Yuan, Ying-Jun Angela Zhang

To exploit massive amounts of data generated at mobile edge networks, federated learning (FL) has been proposed as an attractive substitute for centralized machine learning (ML). By collaboratively training a shared learning model at edge devices, FL avoids direct data transmission and thus overcomes high communication latency and privacy issues as compared to centralized ML. To improve the communication efficiency in FL model aggregation, over-the-air computation has been introduced to support a large number of simultaneous local model uploading by exploiting the inherent superposition property of wireless channels. However, due to the heterogeneity of communication capacities among edge devices, over-the-air FL suffers from the straggler issue in which the device with the weakest channel acts as a bottleneck of the model aggregation performance. This issue can be alleviated by device selection to some extent, but the latter still suffers from a tradeoff between data exploitation and model communication. In this paper, we leverage the reconfigurable intelligent surface (RIS) technology to relieve the straggler issue in over-the-air FL. Specifically, we develop a learning analysis framework to quantitatively characterize the impact of device selection and model aggregation error on the convergence of over-the-air FL. Then, we formulate a unified communication-learning optimization problem to jointly optimize device selection, over-the-air transceiver design, and RIS configuration. Numerical experiments show that the proposed design achieves substantial learning accuracy improvement compared with the state-of-the-art approaches, especially when channel conditions vary dramatically across edge devices.

HCOct 26, 2020
The Age-related Differences in Web Information Search Process

Zhaopeng Xing, Xiaojun Yuan, Lisa Vizer

Older adults' need for quality health information has never been more critical as during the COVID-19 pandemic. Yet, they are susceptible to the wide-spread misinformation disseminated through search engines and social media. To build a search-related behavioral profile of older adults, this article surveys the empirical research on age-related differences in query formulation, search strategies, information evaluation, and susceptibility to misinformation effects. It also decomposes the mechanisms (i.e., cognitive changes, development goal shift) and moderators (i.e., search task and interface design) of such differences. To inform the design of information systems to improve older adults' information search experience, we discuss opportunities for future research.

CROct 20, 2012
Pragmatic Physical Layer Encryption for Achieving Perfect Secrecy

Suzhi Bi, Xiaojun Yuan, Ying Jun Zhang

Conventionally, secrecy is achieved using cryptographic techniques beyond the physical layer. Recent studies raise the interest of performing encryption within the physical layer by exploiting some unique features of the physical wireless channel. Following this spirit, we present a novel physical layer encryption (PLE) scheme that randomizes the radio signal using a secret key extracted from the wireless channel under the assumption of channel reciprocity. Specifically, we propose to jointly design the encryption function and the secret-key generation method. On one hand, we establish a sufficient and necessary condition for the encryption function to achieve perfect secrecy. Based on that, several candidate encryption functions are proposed and compared. We show that, given the secret key available to the legitimate users, perfect secrecy can be achieved without compromising the capability of the communication channel. On the other hand, we study the practical design of the secret-key generation method based on the channel reciprocity. We show that, by introducing marginal system overhead, the key agreement between the legitimate users can be done with a high success probability. The performance advantages of the proposed PLE method is verified through comparisons against other existing PLE methods.