Ming Xiao

h-index182

32papers

385citations

Novelty48%

AI Score55

Ranked #26,308 of 201,326 authors (top 13%)#6,064 in LG (top 14%)

32 Papers

LGJun 16, 2023

Towards Quantum Federated Learning

Chao Ren, Rudai Yan, Huihui Zhu et al.

Quantum Federated Learning (QFL) is an emerging interdisciplinary field that merges the principles of Quantum Computing (QC) and Federated Learning (FL), with the goal of leveraging quantum technologies to enhance privacy, security, and efficiency in the learning process. Currently, there is no comprehensive survey for this interdisciplinary field. This review offers a thorough, holistic examination of QFL. We aim to provide a comprehensive understanding of the principles, techniques, and emerging applications of QFL. We discuss the current state of research in this rapidly evolving field, identify challenges and opportunities associated with integrating these technologies, and outline future directions and open research questions. We propose a unique taxonomy of QFL techniques, categorized according to their characteristics and the quantum techniques employed. As the field of QFL continues to progress, we can anticipate further breakthroughs and applications across various industries, driving innovation and addressing challenges related to data privacy, security, and resource optimization. This review serves as a first-of-its-kind comprehensive guide for researchers and practitioners interested in understanding and advancing the field of QFL.

SYApr 26, 2017

On Maximizing Sensor Network Lifetime by Energy Balancing

Rong Du, Lazaros Gkatzikis, Carlo Fischione et al.

Many physical systems, such as water/electricity distribution networks, are monitored by battery-powered Wireless Sensor Networks (WSNs). Since battery replacement of sensor nodes is generally difficult, long-term monitoring can be only achieved if the operation of the WSN nodes contributes to a long WSN lifetime. Two prominent techniques to long WSN lifetime are i) optimal sensor activation and ii) efficient data gathering and forwarding based on compressive sensing. These techniques are feasible only if the activated sensor nodes establish a connected communication network (connectivity constraint), and satisfy a compressive sensing decoding constraint (cardinality constraint). These two constraints make the problem of maximizing network lifetime via sensor node activation and compressive sensing NP-hard. To overcome this difficulty, an alternative approach that iteratively solves energy balancing problems is proposed. However, understanding whether maximizing network lifetime and energy balancing problems are aligned objectives is a fundamental open issue. The analysis reveals that the two optimization problems give different solutions, but the difference between the lifetime achieved by the energy balancing approach and the maximum lifetime is small when the initial energy at sensor nodes is significantly larger than the energy consumed for a single transmission. The lifetime achieved by the energy balancing is asymptotically optimal, and that the achievable network lifetime is at least $50$\% of the optimum. Analysis and numerical simulations quantify the efficiency of the proposed energy balancing approach.

LGFeb 13

Quantization-Aware Collaborative Inference for Large Embodied AI Models

Zhonghao Lyu, Ming Xiao, Mikael Skoglund et al.

Large artificial intelligence models (LAIMs) are increasingly regarded as a core intelligence engine for embodied AI applications. However, the massive parameter scale and computational demands of LAIMs pose significant challenges for resource-limited embodied agents. To address this issue, we investigate quantization-aware collaborative inference (co-inference) for embodied AI systems. First, we develop a tractable approximation for quantization-induced inference distortion. Based on this approximation, we derive lower and upper bounds on the quantization rate-inference distortion function, characterizing its dependence on LAIM statistics, including the quantization bit-width. Next, we formulate a joint quantization bit-width and computation frequency design problem under delay and energy constraints, aiming to minimize the distortion upper bound while ensuring tightness through the corresponding lower bound. Extensive evaluations validate the proposed distortion approximation, the derived rate-distortion bounds, and the effectiveness of the proposed joint design. Particularly, simulations and real-world testbed experiments demonstrate the effectiveness of the proposed joint design in balancing inference quality, latency, and energy consumption in edge embodied AI systems.

SOC-PHNov 26, 2025

AI4X Roadmap: Artificial Intelligence for the advancement of scientific pursuit and its future directions

Stephen G. Dale, Nikita Kazeev, Alastair J. A. Price et al.

Artificial intelligence and machine learning are reshaping how we approach scientific discovery, not by replacing established methods but by extending what researchers can probe, predict, and design. In this roadmap we provide a forward-looking view of AI-enabled science across biology, chemistry, climate science, mathematics, materials science, physics, self-driving laboratories and unconventional computing. Several shared themes emerge: the need for diverse and trustworthy data, transferable electronic-structure and interatomic models, AI systems integrated into end-to-end scientific workflows that connect simulations to experiments and generative systems grounded in synthesisability rather than purely idealised phases. Across domains, we highlight how large foundation models, active learning and self-driving laboratories can close loops between prediction and validation while maintaining reproducibility and physical interpretability. Taken together, these perspectives outline where AI-enabled science stands today, identify bottlenecks in data, methods and infrastructure, and chart concrete directions for building AI systems that are not only more powerful but also more transparent and capable of accelerating discovery in complex real-world environments.

LGJul 18, 2022

Vertical GaN Diode BV Maximization through Rapid TCAD Simulation and ML-enabled Surrogate Model

Albert Lu, Jordan Marshall, Yifan Wang et al.

In this paper, two methodologies are used to speed up the maximization of the breakdown volt-age (BV) of a vertical GaN diode that has a theoretical maximum BV of ~2100V. Firstly, we demonstrated a 5X faster accurate simulation method in Technology Computer-Aided-Design (TCAD). This allows us to find 50% more numbers of high BV (>1400V) designs at a given simulation time. Secondly, a machine learning (ML) model is developed using TCAD-generated data and used as a surrogate model for differential evolution optimization. It can inversely design an out-of-the-training-range structure with BV as high as 1887V (89% of the ideal case) compared to ~1100V designed with human domain expertise.

35.2LGMay 26

The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery

Vasileios Saketos, Ming Xiao

State estimation is a fundamental problem in control and signal processing, for which the Kalman Filter provides an optimal solution under linear dynamics, Gaussian noise, and known noise covariances. However, these assumptions often fail in realistic sensing settings such as Doppler radar and LiDAR. In these cases, the optimal estimator is inherently nonlinear, which leads to systematic performance degradation. This creates a performance gap that cannot be eliminated by tuning the noise covariance parameters (i.e., the process and measurement noise in the Kalman Filter) alone. To address this limitation, we propose Kalman Evolve, a framework for discovering improved filtering algorithms by jointly optimizing both noise parameters and the update structure. Our approach leverages large language models (LLMs) as a structured prior over program space, enabling the generation of interpretable, non-affine modifications to the classical Kalman filter while preserving its recursive form. We provide analytical results establishing the suboptimality of affine estimators under common nonlinear sensing models, motivating the need for structure-aware updates. Across a range of synthetic and real-world tracking benchmarks, including Doppler radar, LiDAR-based localization, and pedestrian tracking, the discovered algorithms consistently improve over strong baselines such as the Optimized Kalman Filter, achieving up to 12\% reduction in RMSE. These results suggest that optimizing the structure of the Kalman filter, rather than only its parameters, provides a practical and interpretable way to improve state estimation.

MAFeb 10Code

LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis

Shihao Xu, Tiancheng Zhou, Jiatong Ma et al.

Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment. Progress in AI-assisted psychiatric diagnosis is constrained by the absence of benchmarks that simultaneously provide realistic patient simulation, clinician-verified diagnostic labels, and support for dynamic multi-turn consultation. We present LingxiDiagBench, a large-scale multi-agent benchmark that evaluates LLMs on both static diagnostic inference and dynamic multi-turn psychiatric consultation in Chinese. At its core is LingxiDiag-16K, a dataset of 16,000 EMR-aligned synthetic consultation dialogues designed to reproduce real clinical demographic and diagnostic distributions across 12 ICD-10 psychiatric categories. Through extensive experiments across state-of-the-art LLMs, we establish key findings: (1) although LLMs achieve high accuracy on binary depression--anxiety classification (up to 92.3%), performance deteriorates substantially for depression--anxiety comorbidity recognition (43.0%) and 12-way differential diagnosis (28.5%); (2) dynamic consultation often underperforms static evaluation, indicating that ineffective information-gathering strategies significantly impair downstream diagnostic reasoning; (3) consultation quality assessed by LLM-as-a-Judge shows only moderate correlation with diagnostic accuracy, suggesting that well-structured questioning alone does not ensure correct diagnostic decisions. We release LingxiDiag-16K and the full evaluation framework to support reproducible research at https://github.com/Lingxi-mental-health/LingxiDiagBench.

64.7DCMar 17

Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding

Chengxi Li, Youssef Allouah, Rachid Guerraoui et al.

In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server to enhance robustness to Byzantine attacks, the existing methods suffer from a critical limitation in that the solution error does not diminish when the local gradients sent by different devices vary considerably, as a result of data heterogeneity among the subsets held by different devices. To overcome this limitation, we propose a novel DT method, cyclic gradient coding-based DT (LAD). In LAD, the server allocates the entire training dataset to the devices before training begins. In each iteration, it assigns computational tasks redundantly to the devices using cyclic gradient coding. Each honest device then computes local gradients on a fixed number of data subsets and encodes the local gradients before transmitting to the server. The server aggregates the coded vectors from the honest devices and the potentially incorrect messages from Byzantine devices using a robust aggregation rule. Leveraging the redundancy of computation across devices, the convergence performance of LAD is analytically characterized, demonstrating improved robustness against Byzantine attacks and significantly lower solution error. Furthermore, we extend LAD to a communication-efficient variant, compressive and cyclic gradient coding-based DT (Com-LAD), which further reduces communication overhead under constrained settings. Numerical results validate the effectiveness of the proposed methods in enhancing both Byzantine resilience and communication efficiency.

26.9LGMay 15

ITGPT: Generative Pretraining on Irregular Timeseries

Antoine Honoré, Ming Xiao

Timeseries regression models often struggle to leverage large volumes of labeled multimodal data, particularly when the data are irregularly sampled or contain missing values. This is common in domains like healthcare and predictive maintenance, where data are collected from unreliable sources, and labeling requires expert knowledge or costly equipments. Transformer-based large language models have proven effective on structured data such as text through self-supervised learning (SSL) and generative pretraining (GPT) frameworks. However, such models lack the flexibility to efficiently process irregularly sampled multimodal timeseries data. In this paper, we introduce ITGPT, an attention-based architecture designed for handling multimodal, irregularly sampled timeseries by allowing training with both SSL losses and GPT-like objectives. We evaluate its performance on a healthcare task with the TIHM dataset, and a predictive maintenance task with the CompX dataset. Our results demonstrate that ITGPT achieves state-of-the-art performance without requiring resampling, feature fusion or explicit data imputation. Furthermore, when labels are scarce, ITGPT effectively leverages unlabeled data through SSL and GPT training, outperforming the purely supervised approach. This represents an important step towards efficiently using large and unstructured timeseries datasets for practical inference tasks.

AIJan 27

ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks

Haoyun Li, Ming Xiao, Kezhi Wang et al.

Emerging 6G networks rely on complex cross-layer optimization, yet manually translating high-level intents into mathematical formulations remains a bottleneck. While Large Language Models (LLMs) offer promise, monolithic approaches often lack sufficient domain grounding, constraint awareness, and verification capabilities. To address this, we present ComAgent, a multi-LLM agentic AI framework. ComAgent employs a closed-loop Perception-Planning-Action-Reflection cycle, coordinating specialized agents for literature search, coding, and scoring to autonomously generate solver-ready formulations and reproducible simulations. By iteratively decomposing problems and self-correcting errors, the framework effectively bridges the gap between user intent and execution. Evaluations demonstrate that ComAgent achieves expert-comparable performance in complex beamforming optimization and outperforms monolithic LLMs across diverse wireless tasks, highlighting its potential for automating design in emerging wireless networks.

59.2DCMar 17

Biased Compression in Gradient Coding for Distributed Learning

Chengxi Li, Ming Xiao, Mikael Skoglund

Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression functions and gradient coding. However, the significant benefits of biased compression remain largely unexplored. To close this gap, we propose Compressed Gradient Coding with Error Feedback (COCO-EF), a novel DL method that combines gradient coding with biased compression to mitigate straggler effects and reduce communication costs. In each iteration, non-straggler devices encode local gradients from redundantly allocated training data, incorporate prior compression errors, and compress the results using biased compression functions before transmission. The server aggregates these compressed messages from the non-stragglers to approximate the global gradient for model updates. We provide rigorous theoretical convergence guarantees for COCO-EF and validate its superior learning performance over baseline methods through empirical evaluations. As far as we know, we are among the first to rigorously demonstrate that biased compression has substantial benefits in DL, when gradient coding is employed to cope with stragglers.

LGJan 12

Land-then-transport: A Flow Matching-Based Generative Decoder for Wireless Image Transmission

Jingwen Fu, Ming Xiao, Mikael Skoglund et al.

Due to strict rate and reliability demands, wireless image transmission remains difficult for both classical layered designs and joint source-channel coding (JSCC), especially under low latency. Diffusion-based generative decoders can deliver strong perceptual quality by leveraging learned image priors, but iterative stochastic denoising leads to high decoding delay. To enable low-latency decoding, we propose a flow-matching (FM) generative decoder under a new land-then-transport (LTT) paradigm that tightly integrates the physical wireless channel into a continuous-time probability flow. For AWGN channels, we build a Gaussian smoothing path whose noise schedule indexes effective noise levels, and derive a closed-form teacher velocity field along this path. A neural-network student vector field is trained by conditional flow matching, yielding a deterministic, channel-aware ODE decoder with complexity linear in the number of ODE steps. At inference, it only needs an estimate of the effective noise variance to set the ODE starting time. We further show that Rayleigh fading and MIMO channels can be mapped, via linear MMSE equalization and singular-value-domain processing, to AWGN-equivalent channels with calibrated starting times. Therefore, the same probability path and trained velocity field can be reused for Rayleigh and MIMO without retraining. Experiments on MNIST, Fashion-MNIST, and DIV2K over AWGN, Rayleigh, and MIMO demonstrate consistent gains over JPEG2000+LDPC, DeepJSCC, and diffusion-based baselines, while achieving good perceptual quality with only a few ODE steps. Overall, LTT provides a deterministic, physically interpretable, and computation-efficient framework for generative wireless image decoding across diverse channels.

94.2SYMay 4

SkillCom: Decomposing LLM-based Semantic Communication into Task and Channel Aware Skills

Jingwen Fu, Ming Xiao, Mikael Skoglund

Large language models (LLMs) are increasingly used as semantic encoders and decoders in semantic communication. However, current LLM based systems mostly remain monolithic: a single prompted model, or a tightly coupled transmitter/receiver pair, must jointly perform semantic encoding, channel adaptation, and semantic decoding. Such coupling makes intermediate decisions difficult to control, diagnose, or replace, and may cause channel corruption to propagate through a compressed source representation. To address the limitations, we propose \textbf{SkillCom}, a modular framework that decomposes LLM-based semantic communication into four explicit skills: semantic abstraction skill, channel-adaptive transmission skill, receiver-side repair skill, and task execution skill. These skills are interconnected through typed semantic-unit interfaces. Thus, transmission operates on structured unit-level representations rather than on one monolithic text block. This design localizes channel impairment, enables targeted repair from successfully received units, and supports stage-wise ablation and single-skill replacement under matched communication constraints. Experiments on multi-hop question answering and dialogue state tracking show that SkillCom consistently outperforms the monolithic LLM baseline, remains more robust under varying channel conditions, and exhibits task-dependent preferences over skill realizations. The results suggest that explicit skill decomposition provides a more robust and diagnosable foundation for LLM-based semantic communication than monolithic methods.

37.5ITMay 3

Channel-coded Over-the-Air Computation

Shudi Weng, Ming Xiao, Mikael Skoglund

This letter studies channel coding for over-the-air computation (AirComp). AirComp enables efficient wireless data aggregation, where computation accuracy is the key performance metric. However, this accuracy is sensitive to channel impairments. As a promising solution, the role of channel coding in AirComp has been largely unexplored, creating a critical gap in achieving reliable AirComp systems. To address this, we propose a novel channel coding scheme tailored for AirComp that preserves the aggregation structure while mitigating channel distortions. We show that the computation error decreases with the coding rate and can asymptotically approach zero. Both theoretical and simulation results demonstrate that the proposed scheme significantly enhances computation performance.

SPMar 25, 2024

RadioGAT: A Joint Model-based and Data-driven Framework for Multi-band Radiomap Reconstruction via Graph Attention Networks

Xiaojie Li, Songyang Zhang, Hang Li et al.

Multi-band radiomap reconstruction (MB-RMR) is a key component in wireless communications for tasks such as spectrum management and network planning. However, traditional machine-learning-based MB-RMR methods, which rely heavily on simulated data or complete structured ground truth, face significant deployment challenges. These challenges stem from the differences between simulated and actual data, as well as the scarcity of real-world measurements. To address these challenges, our study presents RadioGAT, a novel framework based on Graph Attention Network (GAT) tailored for MB-RMR within a single area, eliminating the need for multi-region datasets. RadioGAT innovatively merges model-based spatial-spectral correlation encoding with data-driven radiomap generalization, thus minimizing the reliance on extensive data sources. The framework begins by transforming sparse multi-band data into a graph structure through an innovative encoding strategy that leverages radio propagation models to capture the spatial-spectral correlation inherent in the data. This graph-based representation not only simplifies data handling but also enables tailored label sampling during training, significantly enhancing the framework's adaptability for deployment. Subsequently, The GAT is employed to generalize the radiomap information across various frequency bands. Extensive experiments using raytracing datasets based on real-world environments have demonstrated RadioGAT's enhanced accuracy in supervised learning settings and its robustness in semi-supervised scenarios. These results underscore RadioGAT's effectiveness and practicality for MB-RMR in environments with limited data availability.

66.9ITApr 30

Perfectly Private Over-the-Air Computation

Shudi Weng, Ming Xiao, Mikael Skoglund

This paper studies a key research question: how to achieve perfect privacy in over-the-air computation (AirComp)? The problem is particularly intriguing due to a dilemma. Real-field operations can ensure invertibility but generally introduce statistical dependence, resulting in inevitable privacy leakage. In contrast, modulo operations can decorrelate the output from the original message, but suffer from the ill-posed invertibility when applied over non-prime groups (e.g., the real field). This raises a subtle yet fundamental question: Does perfect privacy intrinsically conflict with AirComp? We show that the answer is no. By carefully leveraging the interplay between real-field and modulo operations, perfect privacy and accurate computation can, in fact, be achieved simultaneously, enabling perfectly private aggregation.

SPMar 22, 2024

Adaptive Coded Federated Learning: Privacy Preservation and Straggler Mitigation

Chengxi Li, Ming Xiao, Mikael Skoglund

In this article, we address the problem of federated learning in the presence of stragglers. For this problem, a coded federated learning framework has been proposed, where the central server aggregates gradients received from the non-stragglers and gradient computed from a privacy-preservation global coded dataset to mitigate the negative impact of the stragglers. However, when aggregating these gradients, fixed weights are consistently applied across iterations, neglecting the generation process of the global coded dataset and the dynamic nature of the trained model over iterations. This oversight may result in diminished learning performance. To overcome this drawback, we propose a new method named adaptive coded federated learning (ACFL). In ACFL, before the training, each device uploads a coded local dataset with additive noise to the central server to generate a global coded dataset under privacy preservation requirements. During each iteration of the training, the central server aggregates the gradients received from the non-stragglers and the gradient computed from the global coded dataset, where an adaptive policy for varying the aggregation weights is designed. Under this policy, we optimize the performance in terms of privacy and learning, where the learning performance is analyzed through convergence analysis and the privacy performance is characterized via mutual information differential privacy. Finally, we perform simulations to demonstrate the superiority of ACFL compared with the non-adaptive methods.

LGMay 14, 2025

The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks

Zhonghao Lyu, Ming Xiao, Jie Xu et al.

The growing demand for large artificial intelligence model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications. In particular, edge-device co-inference, which partitions LAIMs between edge devices and servers, has emerged as a promising strategy for resource-efficient LAIM execution in wireless networks. In this paper, we investigate a pruning-aware LAIM co-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment. For analysis, we first prove that the LAIM output distortion is upper bounded by its parameter distortion. Then, we derive a lower bound on parameter distortion via rate-distortion theory, analytically capturing the relationship between pruning ratio and co-inference performance. Next, based on the analytical results, we formulate an LAIM co-inference distortion bound minimization problem by jointly optimizing the pruning ratio, transmit power, and computation frequency under system latency, energy, and available resource constraints. Moreover, we propose an efficient algorithm to tackle the considered highly non-convex problem. Finally, extensive simulations demonstrate the effectiveness of the proposed design. In particular, model parameter distortion is shown to provide a reliable bound on output distortion. Also, the proposed joint pruning ratio and resource management design achieves superior performance in balancing trade-offs among inference performance, system latency, and energy consumption compared with benchmark schemes, such as fully on-device and on-server inference. Moreover, the split point is shown to play a critical role in system performance optimization under heterogeneous and resource-limited edge environments.

LGDec 30, 2024

Accelerating Energy-Efficient Federated Learning in Cell-Free Networks with Adaptive Quantization

Afsaneh Mahmoudi, Ming Xiao, Emil Björnson

Federated Learning (FL) enables clients to share learning parameters instead of local data, reducing communication overhead. Traditional wireless networks face latency challenges with FL. In contrast, Cell-Free Massive MIMO (CFmMIMO) can serve multiple clients on shared resources, boosting spectral efficiency and reducing latency for large-scale FL. However, clients' communication resource limitations can hinder the completion of the FL training. To address this challenge, we propose an energy-efficient, low-latency FL framework featuring optimized uplink power allocation for seamless client-server collaboration. Our framework employs an adaptive quantization scheme, dynamically adjusting bit allocation for local gradient updates to reduce communication costs. We formulate a joint optimization problem covering FL model updates, local iterations, and power allocation, solved using sequential quadratic programming (SQP) to balance energy and latency. Additionally, clients use the AdaDelta method for local FL model updates, enhancing local model convergence compared to standard SGD, and we provide a comprehensive analysis of FL convergence with AdaDelta local updates. Numerical results show that, within the same energy and latency budgets, our power allocation scheme outperforms the Dinkelbach and max-sum rate methods by increasing the test accuracy up to $7$\% and $19$\%, respectively. Moreover, for the three power allocation methods, our proposed quantization scheme outperforms AQUILA and LAQ by increasing test accuracy by up to $36$\% and $35$\%, respectively.

LGJan 25

Coding-Enforced Resilient and Secure Aggregation for Hierarchical Federated Learning

Shudi Weng, Ming Xiao, Mikael Skoglund

Hierarchical federated learning (HFL) has emerged as an effective paradigm to enhance link quality between clients and the server. However, ensuring model accuracy while preserving privacy under unreliable communication remains a key challenge in HFL, as the coordination among privacy noise can be randomly disrupted. To address this limitation, we propose a robust hierarchical secure aggregation scheme, termed H-SecCoGC, which integrates coding strategies to enforce structured aggregation. The proposed scheme not only ensures accurate global model construction under varying levels of privacy, but also avoids the partial participation issue, thereby significantly improving robustness, privacy preservation, and learning efficiency. Both theoretical analyses and experimental results demonstrate the superiority of our scheme under unreliable communication across arbitrarily strong privacy guarantees

LGJul 3, 2025

A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes

Antoine Honoré, Borja Rodríguez Gálvez, Yoomi Park et al.

Variant effect predictors (VEPs) aim to assess the functional impact of protein variants, traditionally relying on multiple sequence alignments (MSAs). This approach assumes that naturally occurring variants are fit, an assumption challenged by pharmacogenomics, where some pharmacogenes experience low evolutionary pressure. Deep mutational scanning (DMS) datasets provide an alternative by offering quantitative fitness scores for variants. In this work, we propose a transformer-based matrix variational auto-encoder (matVAE) with a structured prior and evaluate its performance on 33 DMS datasets corresponding to 26 drug target and ADME proteins from the ProteinGym benchmark. Our model trained on MSAs (matVAE-MSA) outperforms the state-of-the-art DeepSequence model in zero-shot prediction on DMS datasets, despite using an order of magnitude fewer parameters and requiring less computation at inference time. We also compare matVAE-MSA to matENC-DMS, a model of similar capacity trained on DMS data, and find that the latter performs better on supervised prediction tasks. Additionally, incorporating AlphaFold-generated structures into our transformer model further improves performance, achieving results comparable to DeepSequence trained on MSAs and finetuned on DMS. These findings highlight the potential of DMS datasets to replace MSAs without significant loss in predictive performance, motivating further development of DMS datasets and exploration of their relationships to enhance variant effect prediction.

LGMay 17, 2025

Coded Robust Aggregation for Distributed Learning under Byzantine Attacks

Chengxi Li, Ming Xiao, Mikael Skoglund

In this paper, we investigate the problem of distributed learning (DL) in the presence of Byzantine attacks. For this problem, various robust bounded aggregation (RBA) rules have been proposed at the central server to mitigate the impact of Byzantine attacks. However, current DL methods apply RBA rules for the local gradients from the honest devices and the disruptive information from Byzantine devices, and the learning performance degrades significantly when the local gradients of different devices vary considerably from each other. To overcome this limitation, we propose a new DL method to cope with Byzantine attacks based on coded robust aggregation (CRA-DL). Before training begins, the training data are allocated to the devices redundantly. During training, in each iteration, the honest devices transmit coded gradients to the server computed from the allocated training data, and the server then aggregates the information received from both honest and Byzantine devices using RBA rules. In this way, the global gradient can be approximately recovered at the server to update the global model. Compared with current DL methods applying RBA rules, the improvement of CRA-DL is attributed to the fact that the coded gradients sent by the honest devices are closer to each other. This closeness enhances the robustness of the aggregation against Byzantine attacks, since Byzantine messages tend to be significantly different from those of honest devices in this case. We theoretically analyze the convergence performance of CRA-DL. Finally, we present numerical results to verify the superiority of the proposed method over existing baselines, showing its enhanced learning performance under Byzantine attacks.

LGFeb 7, 2022

Asynchronous Parallel Incremental Block-Coordinate Descent for Decentralized Machine Learning

Hao Chen, Yu Ye, Ming Xiao et al.

Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing. For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data to a centralized location from distinct ones. This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices and the learning algorithm run on-device, with the aim of relaxing the burden at a central entity/server. Although gossip-based approaches have been used for this purpose in different use cases, they suffer from high communication costs, especially when the number of devices is large. To mitigate this, incremental-based methods are proposed. We first introduce incremental block-coordinate descent (I-BCD) for the decentralized ML, which can reduce communication costs at the expense of running time. To accelerate the convergence speed, an asynchronous parallel incremental BCD (API-BCD) method is proposed, where multiple devices/agents are active in an asynchronous fashion. We derive convergence properties for the proposed methods. Simulation results also show that our API-BCD method outperforms state of the art in terms of running time and communication costs.

SPNov 20, 2021

Satellite Based Computing Networks with Federated Learning

Hao Chen, Ming Xiao, Zhibo Pang

Driven by the ever-increasing penetration and proliferation of data-driven applications, a new generation of wireless communication, the sixth-generation (6G) mobile system enhanced by artificial intelligence (AI), has attracted substantial research interests. Among various candidate technologies of 6G, low earth orbit (LEO) satellites have appealing characteristics of ubiquitous wireless access. However, the costs of satellite communication (SatCom) are still high, relative to counterparts of ground mobile networks. To support massively interconnected devices with intelligent adaptive learning and reduce expensive traffic in SatCom, we propose federated learning (FL) in LEO-based satellite communication networks. We first review the state-of-the-art LEO-based SatCom and related machine learning (ML) techniques, and then analyze four possible ways of combining ML with satellite networks. The learning performance of the proposed strategies is evaluated by simulation and results reveal that FL-based computing networks improve the performance of communication overheads and latency. Finally, we discuss future research topics along this research direction.

LGOct 22, 2021

Federated Learning over Wireless IoT Networks with Optimized Communication and Resources

Hao Chen, Shaocheng Huang, Deyou Zhang et al.

To leverage massive distributed data and computation resources, machine learning in the network edge is considered to be a promising technique especially for large-scale model training. Federated learning (FL), as a paradigm of collaborative learning techniques, has obtained increasing research attention with the benefits of communication efficiency and improved data privacy. Due to the lossy communication channels and limited communication resources (e.g., bandwidth and power), it is of interest to investigate fast responding and accurate FL schemes over wireless systems. Hence, we investigate the problem of jointly optimized communication efficiency and resources for FL over wireless Internet of things (IoT) networks. To reduce complexity, we divide the overall optimization problem into two sub-problems, i.e., the client scheduling problem and the resource allocation problem. To reduce the communication costs for FL in wireless IoT networks, a new client scheduling policy is proposed by reusing stale local model parameters. To maximize successful information exchange over networks, a Lagrange multiplier method is first leveraged by decoupling variables including power variables, bandwidth variables and transmission indicators. Then a linear-search based power and bandwidth allocation method is developed. Given appropriate hyper-parameters, we show that the proposed communication-efficient federated learning (CEFL) framework converges at a strong linear rate. Through extensive experiments, it is revealed that the proposed CEFL framework substantially boosts both the communication efficiency and learning performance of both training loss and test accuracy for FL over wireless IoT networks compared to a basic FL approach with uniform resource allocation.

LGAug 10, 2021

Regularized Sequential Latent Variable Models with Adversarial Neural Networks

Jin Huang, Ming Xiao

The recurrent neural networks (RNN) with richly distributed internal states and flexible non-linear transition functions, have overtaken the dynamic Bayesian networks such as the hidden Markov models (HMMs) in the task of modeling highly structured sequential data. These data, such as from speech and handwriting, often contain complex relationships between the underlaying variational factors and the observed data. The standard RNN model has very limited randomness or variability in its structure, coming from the output conditional probability model. This paper will present different ways of using high level latent random variables in RNN to model the variability in the sequential data, and the training method of such RNN model under the VAE (Variational Autoencoder) principle. We will explore possible ways of using adversarial method to train a variational RNN model. Contrary to competing approaches, our approach has theoretical optimum in the model training and provides better model training stability. Our approach also improves the posterior approximation in the variational inference network by a separated adversarial training step. Numerical results simulated from TIMIT speech data show that reconstruction loss and evidence lower bound converge to the same level and adversarial training loss converges to 0.

LGJun 30, 2021

Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in Edge Industrial IoT

Wanlu Lei, Yu Ye, Ming Xiao et al.

Edge computing provides a promising paradigm to support the implementation of Industrial Internet of Things (IIoT) by offloading tasks to nearby edge nodes. Meanwhile, the increasing network size makes it impractical for centralized data processing due to limited bandwidth, and consequently a decentralized learning scheme is preferable. Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes. For RL in a decentralized setup, edge nodes (agents) connected through a communication network aim to work collaboratively to find a policy to optimize the global reward as the sum of local rewards. However, communication costs, scalability and adaptation in complex environments with heterogeneous agents may significantly limit the performance of decentralized RL. Alternating direction method of multipliers (ADMM) has a structure that allows for decentralized implementation, and has shown faster convergence than gradient descent based methods. Therefore, we propose an adaptive stochastic incremental ADMM (asI-ADMM) algorithm and apply the asI-ADMM to decentralized RL with edge-computing-empowered IIoT networks. We provide convergence properties for proposed algorithms by designing a Lyapunov function and prove that the asI-ADMM has $O(\frac{1}{k}) +O(\frac{1}{M})$ convergence rate where $k$ and $ M$ are the number of iterations and batch samples, respectively. Then, we test our algorithm with two supervised learning problems. For performance evaluation, we simulate two applications in decentralized RL settings with homogeneous and heterogeneous agents. The experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.

DCOct 2, 2020

Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing

Hao Chen, Yu Ye, Ming Xiao et al.

Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. To train large-scale machine learning models, edge/fog computing is often leveraged as an alternative to centralized learning. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch stochastic alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model. To address two main critical challenges in distributed networks, i.e., communication bottleneck and straggler nodes (nodes with slow responses), error-control-coding based stochastic incremental ADMM is investigated. Given an appropriate mini-batch size, we show that the mini-batch stochastic ADMM based method converges in a rate of $O(\frac{1}{\sqrt{k}})$, where $k$ denotes the number of iterations. Through numerical experiments, it is revealed that the proposed algorithm is communication-efficient, rapidly responding and robust in the presence of straggler nodes compared with state of the art algorithms.

SPApr 27, 2020

Learning Based Hybrid Beamforming for Millimeter Wave Multi-User MIMO Systems

Shaocheng Huang, Yu Ye, Ming Xiao

Hybrid beamforming (HBF) design is a crucial stage in millimeter wave (mmWave) multi-user multi-input multi-output (MU-MIMO) systems. However, conventional HBF methods are still with high complexity and strongly rely on the quality of channel state information. We propose an extreme learning machine (ELM) framework to jointly optimize transmitting and receiving beamformers. Specifically, to provide accurate labels for training, we first propose an factional-programming and majorization-minimization based HBF method (FP-MM-HBF). Then, an ELM based HBF (ELM-HBF) framework is proposed to increase the robustness of beamformers. Both FP-MM-HBF and ELM-HBF can provide higher system sum-rate compared with existing methods. Moreover, ELM-HBF cannot only provide robust HBF performance, but also consume very short computation time.

SPApr 16, 2020

Learning Based Hybrid Beamforming Design for Full-Duplex Millimeter Wave Systems

Shaocheng Huang, Yu Ye, Ming Xiao

Millimeter Wave (mmWave) communications with full-duplex (FD) have the potential of increasing the spectral efficiency, relative to those with half-duplex. However, the residual self-interference (SI) from FD and high pathloss inherent to mmWave signals may degrade the system performance. Meanwhile, hybrid beamforming (HBF) is an efficient technology to enhance the channel gain and mitigate interference with reasonable complexity. However, conventional HBF approaches for FD mmWave systems are based on optimization processes, which are either too complex or strongly rely on the quality of channel state information (CSI). We propose two learning schemes to design HBF for FD mmWave systems, i.e., extreme learning machine based HBF (ELM-HBF) and convolutional neural networks based HBF (CNN-HBF). Specifically, we first propose an alternating direction method of multipliers (ADMM) based algorithm to achieve SI cancellation beamforming, and then use a majorization-minimization (MM) based algorithm for joint transmitting and receiving HBF optimization. To train the learning networks, we simulate noisy channels as input, and select the hybrid beamformers calculated by proposed algorithms as targets. Results show that both learning based schemes can provide more robust HBF performance and achieve at least 22.1% higher spectral efficiency compared to orthogonal matching pursuit (OMP) algorithms. Besides, the online prediction time of proposed learning based schemes is almost 20 times faster than the OMP scheme. Furthermore, the training time of ELM-HBF is about 600 times faster than that of CNN-HBF with 64 transmitting and receiving antennas.

LGAug 22, 2019

Mobility-aware Content Preference Learning in Decentralized Caching Networks

Yu Ye, Ming Xiao, Mikael Skoglund

Due to the drastic increase of mobile traffic, wireless caching is proposed to serve repeated requests for content download. To determine the caching scheme for decentralized caching networks, the content preference learning problem based on mobility prediction is studied. We first formulate preference prediction as a decentralized regularized multi-task learning (DRMTL) problem without considering the mobility of mobile terminals (MTs). The problem is solved by a hybrid Jacobian and Gauss-Seidel proximal multi-block alternating direction method (ADMM) based algorithm, which is proven to conditionally converge to the optimal solution with a rate $O(1/k)$. Then we use the tool of \textit{Markov renewal process} to predict the moving path and sojourn time for MTs, and integrate the mobility pattern with the DRMTL model by reweighting the training samples and introducing a transfer penalty in the objective. We solve the problem and prove that the developed algorithm has the same convergence property but with different conditions. Through simulation we show the convergence analysis on proposed algorithms. Our real trace driven experiments illustrate that the mobility-aware DRMTL model can provide a more accurate prediction on geography preference than DRMTL model. Besides, the hit ratio achieved by most popular proactive caching (MPC) policy with preference predicted by mobility-aware DRMTL outperforms the MPC with preference from DRMTL and random caching (RC) schemes.

LGApr 25, 2019

Decentralized Multi-Task Learning Based on Extreme Learning Machines

Yu Ye, Ming Xiao, Mikael Skoglund

In multi-task learning (MTL), related tasks learn jointly to improve generalization performance. To exploit the high learning speed of extreme learning machines (ELMs), we apply the ELM framework to the MTL problem, where the output weights of ELMs for all the tasks are learned collaboratively. We first present the ELM based MTL problem in the centralized setting, which is solved by the proposed MTL-ELM algorithm. Due to the fact that many data sets of different tasks are geo-distributed, decentralized machine learning is studied. We formulate the decentralized MTL problem based on ELM as majorized multi-block optimization with coupled bi-convex objective functions. To solve the problem, we propose the DMTL-ELM algorithm, which is a hybrid Jacobian and Gauss-Seidel Proximal multi-block alternating direction method of multipliers (ADMM). Further, to reduce the computation load of DMTL-ELM, DMTL-ELM with first-order approximation (FO-DMTL-ELM) is presented. Theoretical analysis shows that the convergence to the stationary point of DMTL-ELM and FO-DMTL-ELM can be guaranteed conditionally. Through simulations, we demonstrate the convergence of proposed MTL-ELM, DMTL-ELM, and FO-DMTL-ELM algorithms, and also show that they can outperform existing MTL methods. Moreover, by adjusting the dimension of hidden feature space, there exists a trade-off between communication load and learning accuracy for DMTL-ELM.