Wenyi Zhang

h-index17

23papers

626citations

Novelty52%

AI Score57

Ranked #16,443 of 201,326 authors (top 8%)#13 in IT (top 2%)

23 Papers

93.5SYJun 3

Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination

Yugui Liu, Yibo Ding, Xudong Li et al.

Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs, thereby failing to fully unlock this flexibility. In this paper, a bi-level computation-electricity coordination framework is proposed to explicitly capture the bidirectional interactions between DCs and power grid. Firstly, a peer-to-peer cloud service market (P2P-CSM) for geo-distributed DCs is proposed, which enables bilateral cloud service transactions to leverage regional heterogeneities (e.g., electricity prices, cooling efficiency). Secondly, locational marginal prices are embedded into the framework to reflect network congestion and nodal price disparities. Thirdly, a dual consensus alternating direction method of multipliers (ADMM)-based decentralized algorithm is developed as the P2P market clearing algorithm, and a bisection-assisted iterative algorithm is proposed to ensure rigorous convergence of the framework. Case studies conducted on modified IEEE 30-bus system validate that the P2P-CSM achieves a win-win computation-electricity coordination: it not only increases total DC operational profit by 22.8\%, but also effectively alleviates grid congestion and yields a 3.2\% reduction in total energy consumption.

CVAug 14, 2024Code

Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks

Liting Jiang, Feng Wang, Wenyi Zhang et al.

Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-domain data from different sensors and scenarios, in this paper, we dedicate to study key training factors from three perspectives. (1) For the selection of training dataset, it is important to select data with similar regional target distribution as the test set instead of utilizing data from the same sensor. (2) For model structure, cascaded structure that flexibly adapts to different sizes of features is preferred. (3) For training manner, unsupervised methods generalize better than supervised methods, and we design an unsupervised early-stop strategy to help retain the best model with pre-trained weights as the basis. Extensive experiments are conducted to support the previous findings, on the basis of which we present an unsupervised stereo matching network with good generalization performance. We release the source code and the datasets at https://github.com/Elenairene/RKF_RSSM to reproduce the results and encourage future work.

LGMar 27, 2023

Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation

Rongyu Zhang, Xiaowei Chi, Guiliang Liu et al.

Multimodal learning has seen great success mining data features from multiple modalities with remarkable model performance improvement. Meanwhile, federated learning (FL) addresses the data sharing problem, enabling privacy-preserved collaborative training to provide sufficient precious data. Great potential, therefore, arises with the confluence of them, known as multimodal federated learning. However, limitation lies in the predominant approaches as they often assume that each local dataset records samples from all modalities. In this paper, we aim to bridge this gap by proposing an Unimodal Training - Multimodal Prediction (UTMP) framework under the context of multimodal federated learning. We design HA-Fedformer, a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client and multimodal testing by aggregating multiple clients' knowledge for better accuracy. The key advantages are twofold. Firstly, to alleviate the impact of data non-IID, we develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling. Secondly, to overcome the challenge of unaligned language sequence, we implement a cross-modal decoder aggregation to capture the hidden signal correlation between decoders trained by data from different modalities. Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models under the UTMP federated learning frameworks, with 15%-20% improvement on most attributes.

74.2ITMay 29

Sensing with Random Signals: The Role of Time Sharing

Yi Geng, Wenyi Zhang

In monostatic, decision-aided, or known-waveform integrated sensing and communications (ISAC) formulations, the sensing receiver is often modeled as knowing the transmitted waveform. This assumption is not suitable for passive, bistatic, or distributed settings where the sensing receiver knows the signaling rule but not the transmitted symbols. We study such a symbol-unaware ISAC model, where sensing is measured by the unconditioned mutual information $I(S;V)$ rather than the symbol-aware quantity $I(S;V|X)$. For discrete-input memoryless channels, we characterize the capacity-sensing region through an auxiliary time-sharing variable, showing that the optimal upper boundary is the upper concave envelope of the single-mode frontier. Thus, explicit time sharing is unnecessary when the single-mode frontier is already concave, but strictly beneficial when its upper concave envelope strictly dominates the frontier. For Rayleigh-fading BPSK, we further show that the curvature of the single-mode boundary is determined by the stochastic ordering of the communication- and sensing-side effective SNR distributions. Communication-side dominance yields a concave single-mode frontier and no time-sharing gain, sensing-side dominance yields a convex single-mode frontier and a strict time-sharing gain, and equality yields a linear boundary. The result extends to SIMO-BPSK through the ordering of post-combining SNR distributions. These findings explain when symbol-unaware ISAC optimally moves from data-symbol transmission to pilot-like sensing modes.

SDSep 15, 2024

A Survey of Foundation Models for Music Understanding

Wenjun Li, Ying Cai, Ziyang Wu et al.

Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting us personally, socially, and culturally. A better understanding of music can enhance our emotions, cognitive skills, and cultural connections. The rapid advancement of artificial intelligence (AI) has introduced new ways to analyze music, aiming to replicate human understanding of music and provide related services. While the traditional models focused on audio features and simple tasks, the recent development of large language models (LLMs) and foundation models (FMs), which excel in various fields by integrating semantic information and demonstrating strong reasoning abilities, could capture complex musical features and patterns, integrate music with language and incorporate rich musical, emotional and psychological knowledge. Therefore, they have the potential in handling complex music understanding tasks from a semantic perspective, producing outputs closer to human perception. This work, to our best knowledge, is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities. We also discussed their limitations and proposed possible future directions, offering insights for researchers in this field.

99.9ITMay 3

ORBGRAND Is Exactly Capacity-achieving via Rank Companding

Zhuang Li, Wenyi Zhang

Within the family of guessing-based decoding algorithms, ordered reliability bits GRAND (ORBGRAND) has attracted considerable attention due to its efficient use of soft information and suitability for hardware implementation. It has also been shown that ORBGRAND achieves a rate very close to the capacity of an additive white Gaussian noise channel under antipodal signaling. In this work, it is further established that, for general binary-input memoryless channels under symmetric input distribution, via suitably companding the ranks in ORBGRAND according to the inverse cumulative distribution function (CDF) of channel reliability, the resulting CDF-ORBGRAND algorithm exactly achieves the mutual information, i.e., the symmetric capacity. This result is then applied to bit-interleaved coded modulation (BICM) systems to handle high-order input constellations. Via considering the effects of mismatched decoding due to both BICM and ORBGRAND, it is shown that CDF-ORBGRAND is capable of achieving the BICM capacity, which was initially derived in the literature by treating BICM as a set of independent parallel channels.

CVNov 30, 2025Code

Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation

Chengzhi Yu, Yifan Xu, Yifan Chen et al.

Recently, large vision-language models (LVLMs) have risen to be a promising approach for multimodal tasks. However, principled hallucination mitigation remains a critical challenge.In this work, we first analyze the data generation process in LVLM hallucination mitigation and affirm that on-policy data significantly outperforms off-policy data, which thus calls for efficient and reliable preference annotation of on-policy data. We then point out that, existing annotation methods introduce additional hallucination in training samples, which may enhance the model's hallucination patterns, to address this problem, we propose training a hallucination classifier giving binary annotations, which guarantee clean chosen samples for the subsequent alignment. To further harness of the power of on-policy data, we design a robust iterative direct preference optimization (DPO) algorithm adopting a dynamic sample reweighting scheme. We conduct comprehensive experiments on three benchmarks with comparison to 8 state-of-the-art baselines. In particular, our approach reduces the hallucination rate of LLaVA-1.5-7B on MMHalBench by 50.8% and the average hallucination rate on Object HalBench by 79.5%; more significantly, our method fully taps into the potential of open-source models, enabling LLaVA-1.5-13B to even surpass the performance of GPT-4V.

80.2ITApr 10

A Remark on Downlink Massive Random Access

Yuchen Liao, Wenyi Zhang

In downlink massive random access (DMRA), a base station transmits messages to a typically small subset of active users, selected randomly from a massive number of total users. Explicitly encoding the identities of active users would incur a significant overhead scaling logarithmically with the number of total users. Recently, via a random coding argument, Song, Attiah and Yu have shown that the overhead can be reduced to within some upper bound irrespective of the number of total users. In this remark, recognizing that the code design for DMRA is an instance of covering arrays in combinatorics, we show that there exists deterministic construction of variable-length codes that incur an overhead no greater than $1 + log_2 e$ bits.

38.1ITMay 15

Vectorized Generalized Nearest Neighbor Decoding for In-block Memory Channel

Yuhao Liu, Xinwei Li, Shuqin Pang et al.

This work extends the generalized nearest neighbor decoding (GNND), originally developed as a receiver architecture for memoryless channels, to a vectorized GNND (Vec-GNND) suitable for in-block memory (IBM) channels. Leveraging the generalized mutual information (GMI) as an operational lower bound on the mismatch capacity, an analytical characterization of the optimal Vec-GNND is obtained for general IBM channels with Gaussian codebooks. The formalism further provides closed-form optimality conditions and achievable GMIs for restricted variants of the receiver architecture. Furthermore, we formulate a GMI-based joint design viewpoint for Gaussian codebook covariance and decoding metrics. Since the metric optimization admits a closed-form solution for each fixed covariance, the joint design is reduced to an input-covariance optimization problem; for the diagonal covariance family, we derive first-order self-consistent optimality conditions. Numerical evaluations on block noncoherent additive white Gaussian noise channels and phase noise channels demonstrate consistent performance gains over conventional scaling-based baselines, highlighting the substantial advantages and potential relevance of the proposed Vec-GNND in realistic communication scenarios.

90.4ITMar 28

A Parallelization Strategy for GRAND with Optimality Guarantee by Exploiting Error Pattern Tree Representation

Li Wan, Huarui Yin, Wenyi Zhang

Parallelism has become a central concern in modern decoding frameworks aiming to meet stringent throughput and latency requirements. Guessing Random Additive Noise Decoding (GRAND) is a recently proposed decoding paradigm that tests candidate Error Patterns (EPs) until a valid codeword is found. Among its variants, Soft GRAND (SGRAND) achieves maximum-likelihood (ML) decoding but relies on real-time generation and likelihood ordering of EPs, making parallel execution nontrivial under the ML optimality constraint. In this work, we introduce a unified binary tree representation of EPs, termed the EP tree, which formalizes the hierarchical structure underlying SGRAND and Ordered Reliability Bits (ORB) GRAND algorithms, enabling structured organization of EPs and algorithmic-level parallel exploration. Building upon this unified framework, we propose a parallel design of SGRAND that preserves ML optimality while significantly reducing decoding complexity through pruning strategies and tree-based computation. Furthermore, we develop an enhanced ORBGRAND algorithm based on the same EP tree representation, improving decoding performance toward ML while retaining parallel efficiency. Numerical experiments show that the proposed parallel SGRAND achieves a $3.96\times$ reduction in decoding latency compared with its serial counterpart, while the enhanced ORBGRAND achieves a $4.21\times$ speedup, demonstrating the effectiveness of the unified tree-based framework and its strong potential for future algorithmic and hardware optimizations.

68.3CLMay 11

Relative Score Policy Optimization for Diffusion Language Models

Zichao Yu, Shengze Xu, Bingqing Jiang et al.

Diffusion large language models (dLLMs) offer a promising route to parallel and efficient text generation, but improving their reasoning ability requires effective post-training. Reinforcement learning with verifiable rewards (RLVR) is a natural choice for this purpose, yet its application to dLLMs is hindered by the absence of tractable sequence-level log-ratios, which are central to standard policy optimization. The lack of tractable sequence-level log-ratios forces existing methods to rely on high-variance ELBO-based approximations, where high verifier rewards can amplify inaccurate score estimates and destabilize RL training. To overcome this issue, we propose \textbf{R}elative \textbf{S}core \textbf{P}olicy \textbf{O}ptimization (RSPO), a simple RLVR method that uses verifiable rewards to calibrate noisy likelihood estimates in dLLMs. The core of our algorithm relies on a key observation: a reward advantage can be interpreted not only as an update direction, but also as a target for the relative log-ratio between the current and reference policies. Accordingly, RSPO calibrates this noisy relative log-ratio estimate by comparing its reward advantage with the reward-implied target relative log-ratio, updating the policy according to the gap between the current estimate and the target rather than the raw advantage alone. Experiments on mathematical reasoning and planning benchmarks show that RSPO yields especially strong gains on planning tasks and competitive mathematical-reasoning performance.

ITJan 20

An Elementary Approach to Scheduling in Generative Diffusion Models

Qiang Sun, H. Vincent Poor, Wenyi Zhang

An elementary approach to characterizing the impact of noise scheduling and time discretization in generative diffusion models is developed. Considering a simplified model where the source distribution is multivariate Gaussian with a given covariance matrix, the explicit closed-form evolution trajectory of the distributions across reverse sampling steps is derived, and consequently, the Kullback-Leibler (KL) divergence between the source distribution and the reverse sampling output is obtained. The effect of the number of time discretization steps on the convergence of this KL divergence is studied via the Euler-Maclaurin expansion. An optimization problem is formulated, and its solution noise schedule is obtained via calculus of variations, shown to follow a tangent law whose coefficient is determined by the eigenvalues of the source covariance matrix. For an alternative scenario, more realistic in practice, where pretrained models have been obtained for some given noise schedules, the KL divergence also provides a measure to compare different time discretization strategies in reverse sampling. Experiments across different datasets and pretrained models demonstrate that the time discretization strategy selected by our approach consistently outperforms baseline and search-based strategies, particularly when the budget on the number of function evaluations is very tight.

LGJan 28

ProFlow: Zero-Shot Physics-Consistent Sampling via Proximal Flow Guidance

Zichao Yu, Ming Li, Wenyi Zhang et al.

Inferring physical fields from sparse observations while strictly satisfying partial differential equations (PDEs) is a fundamental challenge in computational physics. Recently, deep generative models offer powerful data-driven priors for such inverse problems, yet existing methods struggle to enforce hard physical constraints without costly retraining or disrupting the learned generative prior. Consequently, there is a critical need for a sampling mechanism that can reconcile strict physical consistency and observational fidelity with the statistical structure of the pre-trained prior. To this end, we present ProFlow, a proximal guidance framework for zero-shot physics-consistent sampling, defined as inferring solutions from sparse observations using a fixed generative prior without task-specific retraining. The algorithm employs a rigorous two-step scheme that alternates between: (\romannumeral1) a terminal optimization step, which projects the flow prediction onto the intersection of the physically and observationally consistent sets via proximal minimization; and (\romannumeral2) an interpolation step, which maps the refined state back to the generative trajectory to maintain consistency with the learned flow probability path. This procedure admits a Bayesian interpretation as a sequence of local maximum a posteriori (MAP) updates. Comprehensive benchmarks on Poisson, Helmholtz, Darcy, and viscous Burgers' equations demonstrate that ProFlow achieves superior physical and observational consistency, as well as more accurate distributional statistics, compared to state-of-the-art diffusion- and flow-based baselines.

11.4LGMar 18

Detecting Transportation Mode Using Dense Smartphone GPS Trajectories and Transformer Models

Yuandong Zhang, Othmane Echchabi, Tianshu Feng et al.

Transportation mode detection is an important topic within GeoAI and transportation research. In this study, we introduce SpeedTransformer, a novel Transformer-based model that relies solely on speed inputs to infer transportation modes from dense smartphone GPS trajectories. In benchmark experiments, SpeedTransformer outperformed traditional deep learning models, such as the Long Short-Term Memory (LSTM) network. Moreover, the model demonstrated strong flexibility in transfer learning, achieving high accuracy across geographical regions after fine-tuning with small datasets. Finally, we deployed the model in a real-world experiment, where it consistently outperformed baseline models under complex built environments and high data uncertainty. These findings suggest that Transformer architectures, when combined with dense GPS trajectories, hold substantial potential for advancing transportation mode detection and broader mobility-related research.

MLApr 13, 2025

AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse

Zichao Yu, Zhen Zou, Guojiang Shao et al.

Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped similarity pattern between adjacent steps through caching mechanisms, they lack theoretical foundation and rely on simplistic computation reuse, often leading to performance degradation. In this work, we provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method, revealing a linear relationship between the outputs of consecutive steps. This analysis explains why the outputs of adjacent steps exhibit a U-shaped pattern. Furthermore, extending Adams-Bashforth method to higher order, we propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results, with a truncation error bound of only $O(h^k)$ where $h$ is the step size. Extensive validation across diverse image and video diffusion models (including HunyuanVideo and FLUX.1-dev) with various schedulers demonstrates our method's effectiveness in achieving nearly $3\times$ speedup while maintaining original performance levels, offering a practical real-time solution without compromising generation quality.

ITMar 8

A Finite-Blocklength Analysis for ORBGRAND

Zhuang Li, Wenyi Zhang

Within the Guessing Random Additive Noise Decoding (GRAND) family, ordered reliability bits GRAND (ORBGRAND) has received considerable attention for its hardware-friendly exploitation of soft information. Existing information-theoretic results for ORBGRAND are asymptotic in blocklength and do not quantify its performance at short-to-moderate blocklengths. This paper develops a finite-blocklength analysis for ORBGRAND over general bit channel, addressing the key challenge that the rank-induced decoding metric is non-additive and coupled across symbols. We first derive an ORBGRAND-specific random-coding union (RCU)-type achievability (ORB-RCU) bound on the ensemble-average error probability. We then characterize two governing decoding metrics: the transmitted-codeword metric is treated as a U-statistic and analyzed via Hoeffding decomposition, while the competing-codeword metric is reduced to a weighted sum of independent and identically distributed Bernoulli random variables and analyzed through strong large-deviation analysis. Combining these ingredients with a Berry-Esseen argument yields a second-order achievable-rate expansion and the associated normal approximation, whose first-order term is shown to equal the ORBGRAND generalized mutual information and whose second-order term defines an ORBGRAND dispersion with a single-letter variance representation. Numerical results for BPSK-modulated additive white Gaussian noise channel validate the tightness of ORB-RCU relative to the maximum-likelihood based RCU benchmark and the accuracy of the normal approximation in the operating regime of practical interest.

CLSep 27, 2025

Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models

Zichao Yu, Ming Li, Wenyi Zhang et al.

Tree search has recently emerged as a powerful framework for aligning generative models with task-specific rewards at test time. Applying tree search to Masked Diffusion Language Models, however, introduces two key challenges: (i) parallel unmasking yields highly correlated branches, limiting exploration, and (ii) reward evaluation via sampled completions produces high-variance estimates, making pruning unstable. We propose TReASURe, a tree-search test-time alignment method that addresses these issues. It introduces (i) UnmaskBranch, a branching strategy based on first-hitting unmasking that diversifies both token content and reveal order with a single model call per parent node, and (ii) ResubstituteScore, a pruning rule that uses deterministic resubstitution to score partially masked sequences with low-variance proxy completions. Theoretically, we quantify branching efficiency gains in NFEs (number of function evaluations), show that the scoring rule approximates the true reward with error bounded by predictive uncertainty, and prove improvements with larger tree widths. Empirically, TReASURe achieves state-of-the-art results on perplexity, linguistic acceptability, and control of sentiment and toxicity, outperforming prior methods under matched compute budgets, with especially strong gains in low-NFE regimes.

IRApr 15, 2025

PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage

Wenyi Zhang, Ju Jia, Xiaojun Jia et al.

The multimodal datasets can be leveraged to pre-train large-scale vision-language models by providing cross-modal semantics. Current endeavors for determining the usage of datasets mainly focus on single-modal dataset ownership verification through intrusive methods and non-intrusive techniques, while cross-modal approaches remain under-explored. Intrusive methods can adapt to multimodal datasets but degrade model accuracy, while non-intrusive methods rely on label-driven decision boundaries that fail to guarantee stable behaviors for verification. To address these issues, we propose a novel prompt-adapted transferable fingerprinting scheme from a training-free perspective, called PATFinger, which incorporates the global optimal perturbation (GOP) and the adaptive prompts to capture dataset-specific distribution characteristics. Our scheme utilizes inherent dataset attributes as fingerprints instead of compelling the model to learn triggers. The GOP is derived from the sample distribution to maximize embedding drifts between different modalities. Subsequently, our PATFinger re-aligns the adaptive prompt with GOP samples to capture the cross-modal interactions on the carefully crafted surrogate model. This allows the dataset owner to check the usage of datasets by observing specific prediction behaviors linked to the PATFinger during retrieval queries. Extensive experiments demonstrate the effectiveness of our scheme against unauthorized multimodal dataset usage on various cross-modal retrieval architectures by 30% over state-of-the-art baselines.

ITMay 4, 2023

A Constrained BA Algorithm for Rate-Distortion and Distortion-Rate Functions

Lingyi Chen, Shitong Wu, Wenhao Ye et al.

The Blahut-Arimoto (BA) algorithm has played a fundamental role in the numerical computation of rate-distortion (RD) functions. This algorithm possesses a desirable monotonic convergence property by alternatively minimizing its Lagrangian with a fixed multiplier. In this paper, we propose a novel modification of the BA algorithm, wherein the multiplier is updated through a one-dimensional root-finding step using a monotonic univariate function, efficiently implemented by Newton's method in each iteration. Consequently, the modified algorithm directly computes the RD function for a given target distortion, without exploring the entire RD curve as in the original BA algorithm. Moreover, this modification presents a versatile framework, applicable to a wide range of problems, including the computation of distortion-rate (DR) functions. Theoretical analysis shows that the outputs of the modified algorithms still converge to the solutions of the RD and DR functions with rate $O(1/n)$, where $n$ is the number of iterations. Additionally, these algorithms provide $\varepsilon$-approximation solutions with $O\left(\frac{MN\log N}{\varepsilon}(1+\log |\log \varepsilon|)\right)$ arithmetic operations, where $M,N$ are the sizes of source and reproduced alphabets respectively. Numerical experiments demonstrate that the modified algorithms exhibit significant acceleration compared with the original BA algorithms and showcase commendable performance across classical source distributions such as discretized Gaussian, Laplacian and uniform sources.

ITJan 29, 2022

An Indirect Rate-Distortion Characterization for Semantic Sources: General Model and the Case of Gaussian Observation

Jiakun Liu, Shuo Shao, Wenyi Zhang et al.

A new source model, which consists of an intrinsic state part and an extrinsic observation part, is proposed and its information-theoretic characterization, namely its rate-distortion function, is defined and analyzed. Such a source model is motivated by the recent surge of interest in the semantic aspect of information: the intrinsic state corresponds to the semantic feature of the source, which in general is not observable but can only be inferred from the extrinsic observation. There are two distortion measures, one between the intrinsic state and its reproduction, and the other between the extrinsic observation and its reproduction. Under a given code rate, the tradeoff between these two distortion measures is characterized by the rate-distortion function, which is solved via the indirect rate-distortion theory and is termed as the semantic rate-distortion function of the source. As an application of the general model and its analysis, the case of Gaussian extrinsic observation is studied, assuming a linear relationship between the intrinsic state and the extrinsic observation, under a quadratic distortion structure. The semantic rate-distortion function is shown to be the solution of a convex programming programming problem with respect to an error covariance matrix, and a reverse water-filling type of solution is provided when the model further satisfies a diagonalizability condition.

LGJul 26, 2021

Decentralized Federated Learning: Balancing Communication and Computing Costs

Wei Liu, Li Chen, Wenyi Zhang

Decentralized stochastic gradient descent (SGD) is a driving engine for decentralized federated learning (DFL). The performance of decentralized SGD is jointly influenced by inter-node communications and local updates. In this paper, we propose a general DFL framework, which implements both multiple local updates and multiple inter-node communications periodically, to strike a balance between communication efficiency and model consensus. It can provide a general decentralized SGD analytical framework. We establish strong convergence guarantees for the proposed DFL algorithm without the assumption of convex objectives. The convergence rate of DFL can be optimized to achieve the balance of communication and computing costs under constrained resources. For improving communication efficiency of DFL, compressed communication is further introduced to the proposed DFL as a new scheme, named DFL with compressed communication (C-DFL). The proposed C-DFL exhibits linear convergence for strongly convex objectives. Experiment results based on MNIST and CIFAR-10 datasets illustrate the superiority of DFL over traditional decentralized SGD methods and show that C-DFL further enhances communication efficiency.

LGOct 8, 2019

Accelerating Federated Learning via Momentum Gradient Descent

Wei Liu, Li Chen, Yunfei Chen et al.

Federated learning (FL) provides a communication-efficient approach to solve machine learning problems concerning distributed data, without sending raw data to a central server. However, existing works on FL only utilize first-order gradient descent (GD) and do not consider the preceding iterations to gradient update which can potentially accelerate convergence. In this paper, we consider momentum term which relates to the last iteration. The proposed momentum federated learning (MFL) uses momentum gradient descent (MGD) in the local update step of FL system. We establish global convergence properties of MFL and derive an upper bound on MFL convergence rate. Comparing the upper bounds on MFL and FL convergence rate, we provide conditions in which MFL accelerates the convergence. For different machine learning models, the convergence performance of MFL is evaluated based on experiments with MNIST dataset. Simulation results comfirm that MFL is globally convergent and further reveal significant convergence improvement over FL.

ITJun 10, 2019

A Regression Approach to Certain Information Transmission Problems

Wenyi Zhang, Yizhu Wang, Cong Shen et al.

A general information transmission model, under independent and identically distributed Gaussian codebook and nearest neighbor decoding rule with processed channel output, is investigated using the performance metric of generalized mutual information. When the encoder and the decoder know the statistical channel model, it is found that the optimal channel output processing function is the conditional expectation operator, thus hinting a potential role of regression, a classical topic in machine learning, for this model. Without utilizing the statistical channel model, a problem formulation inspired by machine learning principles is established, with suitable performance metrics introduced. A data-driven inference algorithm is proposed to solve the problem, and the effectiveness of the algorithm is validated via numerical experiments. Extensions to more general information transmission models are also discussed.