Zhan Gao

h-index25

29papers

313citations

Novelty51%

AI Score54

Ranked #29,560 of 201,326 authors (top 15%)#16 in SP (top 2%)

29 Papers

ITOct 30, 2022

Decentralized Channel Management in WLANs with Graph Neural Networks

Zhan Gao, Yulin Shao, Deniz Gunduz et al.

Wireless local area networks (WLANs) manage multiple access points (APs) and assign scarce radio frequency resources to APs for satisfying traffic demands of associated user devices. This paper considers the channel allocation problem in WLANs that minimizes the mutual interference among APs, and puts forth a learning-based solution that can be implemented in a decentralized manner. We formulate the channel allocation problem as an unsupervised learning problem, parameterize the control policy of radio channels with graph neural networks (GNNs), and train GNNs with the policy gradient method in a model-free manner. The proposed approach allows for a decentralized implementation due to the distributed nature of GNNs and is equivariant to network permutations. The former provides an efficient and scalable solution for large network scenarios, and the latter renders our algorithm independent of the AP reordering. Empirical results are presented to evaluate the proposed approach and corroborate theoretical findings.

ROMar 8, 2023

Online Control Barrier Functions for Decentralized Multi-Agent Navigation

Zhan Gao, Guang Yang, Amanda Prorok

Control barrier functions (CBFs) enable guaranteed safe multi-agent navigation in the continuous domain. The resulting navigation performance, however, is highly sensitive to the underlying hyperparameters. Traditional approaches consider fixed CBFs (where parameters are tuned apriori), and hence, typically do not perform well in cluttered and highly dynamic environments: conservative parameter values can lead to inefficient agent trajectories, or even failure to reach goal positions, whereas aggressive parameter values can lead to infeasible controls. To overcome these issues, in this paper, we propose online CBFs, whereby hyperparameters are tuned in real-time, as a function of what agents perceive in their immediate neighborhood. Since the explicit relationship between CBFs and navigation performance is hard to model, we leverage reinforcement learning to learn CBF-tuning policies in a model-free manner. Because we parameterize the policies with graph neural networks (GNNs), we are able to synthesize decentralized agent controllers that adjust parameter values locally, varying the degree of conservative and aggressive behaviors across agents. Simulations as well as real-world experiments show that (i) online CBFs are capable of solving navigation scenarios that are infeasible for fixed CBFs, and (ii), that they improve navigation performance by adapting to other agents and changes in the environment.

ROSep 22, 2022

Environment Optimization for Multi-Agent Navigation

Zhan Gao, Amanda Prorok

Traditional approaches to the design of multi-agent navigation algorithms consider the environment as a fixed constraint, despite the obvious influence of spatial constraints on agents' performance. Yet hand-designing improved environment layouts and structures is inefficient and potentially expensive. The goal of this paper is to consider the environment as a decision variable in a system-level optimization problem, where both agent performance and environment cost can be accounted for. We begin by proposing a novel environment optimization problem. We show, through formal proofs, under which conditions the environment can change while guaranteeing completeness (i.e., all agents reach their navigation goals). Our solution leverages a model-free reinforcement learning approach. In order to accommodate a broad range of implementation scenarios, we include both online and offline optimization, and both discrete and continuous environment representations. Numerical results corroborate our theoretical findings and validate our approach.

CVJul 18, 2022

Few-shot Fine-grained Image Classification via Multi-Frequency Neighborhood and Double-cross Modulation

Hegui Zhu, Zhan Gao, Jiayi Wang et al.

Traditional fine-grained image classification typically relies on large-scale training samples with annotated ground-truth. However, some sub-categories have few available samples in real-world applications, and current few-shot models still have difficulty in distinguishing subtle differences among fine-grained categories. To solve this challenge, we propose a novel few-shot fine-grained image classification network (FicNet) using multi-frequency neighborhood (MFN) and double-cross modulation (DCM). MFN focuses on both spatial domain and frequency domain to capture multi-frequency structural representations, which reduces the influence of appearance and background changes to the intra-class distance. DCM consists of bi-crisscross component and double 3D cross-attention component. It modulates the representations by considering global context information and inter-class relationship respectively, which enables the support and query samples respond to the same parts and accurately identify the subtle inter-class differences. The comprehensive experiments on three fine-grained benchmark datasets for two few-shot tasks verify that FicNet has excellent performance compared to the state-of-the-art methods. Especially, the experiments on two datasets, "Caltech-UCSD Birds" and "Stanford Cars", can obtain classification accuracy 93.17\% and 95.36\%, respectively. They are even higher than that the general fine-grained image classification methods can achieve.

OCJun 27, 2023

Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

Zhan Gao, Aryan Mokhtari, Alec Koppel

Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit local superlinear rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however, exhibit a well-known drawback: they require the storage of the previous Hessian approximation matrix or all past curvature information to form the current Hessian inverse approximation. Limited-memory variants of quasi-Newton methods such as the celebrated L-BFGS alleviate this issue by leveraging a limited window of past curvature information to construct the Hessian inverse approximation. As a result, their per iteration complexity and storage requirement is O$(τd)$ where $τ\le d$ is the size of the window and $d$ is the problem dimension reducing the O$(d^2)$ computational cost and memory requirement of standard quasi-Newton methods. However, to the best of our knowledge, there is no result showing a non-asymptotic superlinear convergence rate for any limited-memory quasi-Newton method. In this work, we close this gap by presenting a Limited-memory Greedy BFGS (LG-BFGS) method that can achieve an explicit non-asymptotic superlinear rate. We incorporate displacement aggregation, i.e., decorrelating projection, in post-processing gradient variations, together with a basis vector selection scheme on variable variations, which greedily maximizes a progress measure of the Hessian estimate to the true Hessian. Their combination allows past curvature information to remain in a sparse subspace while yielding a valid representation of the full history. Interestingly, our established non-asymptotic superlinear convergence rate demonstrates an explicit trade-off between the convergence speed and memory requirement, which to our knowledge, is the first of its kind. Numerical results corroborate our theoretical findings and demonstrate the effectiveness of our method.

SPNov 13, 2022

Learning Stable Graph Neural Networks via Spectral Regularization

Zhan Gao, Elvin Isufi

Stability of graph neural networks (GNNs) characterizes how GNNs react to graph perturbations and provides guarantees for architecture performance in noisy scenarios. This paper develops a self-regularized graph neural network (SR-GNN) solution that improves the architecture stability by regularizing the filter frequency responses in the graph spectral domain. The SR-GNN considers not only the graph signal as input but also the eigenvectors of the underlying graph, where the signal is processed to generate task-relevant features and the eigenvectors to characterize the frequency responses at each layer. We train the SR-GNN by minimizing the cost function and regularizing the maximal frequency response close to one. The former improves the architecture performance, while the latter tightens the perturbation stability and alleviates the information loss through multi-layer propagation. We further show the SR-GNN preserves the permutation equivariance, which allows to explore the internal symmetries of graph signals and to exhibit transference on similar graph structures. Numerical results with source localization and movie recommendation corroborate our findings and show the SR-GNN yields a comparable performance with the vanilla GNN on the unperturbed graph but improves substantially the stability.

SPFeb 16, 2023

Graph Neural Networks over the Air for Decentralized Tasks in Wireless Networks

Zhan Gao, Deniz Gunduz

Graph neural networks (GNNs) model representations from networked data and allow for decentralized inference through localized communications. Existing GNN architectures often assume ideal communications and ignore potential channel effects, such as fading and noise, leading to performance degradation in real-world implementation. Considering a GNN implemented over nodes connected through wireless links, this paper conducts a stability analysis to study the impact of channel impairments on the performance of GNNs, and proposes graph neural networks over the air (AirGNNs), a novel GNN architecture that incorporates the communication model. AirGNNs modify graph convolutional operations that shift graph signals over random communication graphs to take into account channel fading and noise when aggregating features from neighbors, thus, improving architecture robustness to channel impairments during testing. We develop a channel-inversion signal transmission strategy for AirGNNs when channel state information (CSI) is available, and propose a stochastic gradient descent based method to train AirGNNs when CSI is unknown. The convergence analysis shows that the training procedure approaches a stationary solution of an associated stochastic optimization problem and the variance analysis characterizes the statistical behavior of the trained model. Experiments on decentralized source localization and multi-robot flocking corroborate theoretical findings and show superior performance of AirGNNs over wireless communication channels.

LGJun 24, 2023

Generalizing Differentially Private Decentralized Deep Learning with Multi-Agent Consensus

Jasmine Bayrooti, Zhan Gao, Amanda Prorok

Cooperative decentralized learning relies on direct information exchange between communicating agents, each with access to locally available datasets. The goal is to agree on model parameters that are optimal over all data. However, sharing parameters with untrustworthy neighbors can incur privacy risks by leaking exploitable information. To enable trustworthy cooperative learning, we propose a framework that embeds differential privacy into decentralized deep learning and secures each agent's local dataset during and after cooperative training. We prove convergence guarantees for algorithms derived from this framework and demonstrate its practical utility when applied to subgradient and ADMM decentralized approaches, finding accuracies approaching the centralized baseline while ensuring individual data samples are resilient to inference attacks. Furthermore, we study the relationships between accuracy, privacy budget, and networks' graph properties on collaborative classification tasks, discovering a useful invariance to the communication graph structure beyond a threshold.

LGMar 19

Stochastic Sequential Decision Making over Expanding Networks with Graph Filtering

Zhan Gao, Bishwadeep Das, Elvin Isufi

Graph filters leverage topological information to process networked data with existing methods mainly studying fixed graphs, ignoring that graphs often expand as nodes continually attach with an unknown pattern. The latter requires developing filter-based decision-making paradigms that take evolution and uncertainty into account. Existing approaches rely on either pre-designed filters or online learning, limited to a myopic view considering only past or present information. To account for future impacts, we propose a stochastic sequential decision-making framework for filtering networked data with a policy that adapts filtering to expanding graphs. By representing filter shifts as agents, we model the filter as a multi-agent system and train the policy following multi-agent reinforcement learning. This accounts for long-term rewards and captures expansion dynamics through sequential decision-making. Moreover, we develop a context-aware graph neural network to parameterize the policy, which tunes filter parameters based on information of both the graph and agents. Experiments on synthetic and real datasets from cold-start recommendation to COVID prediction highlight the benefits of using a sequential decision-making perspective over batch and online filtering alternatives.

NIMar 13

Goal-Oriented Learning at the Edge: Graph Neural Networks Over-the-Air for Blockage Prediction

Lorenzo Mario Amorosa, Zhan Gao, Tony Chahoud et al.

Sixth-generation (6G) wireless networks evolve from connecting devices to connecting intelligence. The focus turns to Goal-Oriented Communications, where the effectiveness of communication is assessed through task-level objectives over traditional throughput-centric metrics. As communication intertwines with learning at the edge, distributed inference over wireless networks faces a critical trade-off between task accuracy and efficient radio resource use. Traditional communication schemes (e.g., OFDMA) are not designed for this trade-off, often facing challenges related to scalability and latency. Therefore, we propose a novel goal-oriented framework that integrates over-the-air computation with spatio-temporal graph learning. Leveraging the wireless channel as an analog aggregation layer, the proposed framework enables low-latency message passing while efficiently aggregating semantically relevant features from distributed nodes. Theoretical analysis confirms that our analog architecture converges to the expressive power of digital message passing, while offering decisive scalability advantages. We assess the framework in proactive line-of-sight blockage prediction for millimeter-wave networks. Through high-fidelity ray-tracing simulations, the framework exhibits strong inductive generalization to unseen networks and adapts to domain shifts via lightweight transfer learning, matching or even outperforming digital baselines with significantly reduced communication overhead.

AINov 18, 2023

An Improved CNN-based Neural Network Model for Fruit Sugar Level Detection

Boyang Deng, Xin Wen, Zhan Gao

Artificial Intelligence (AI) is widely used in image classification, recognition, text understanding, and natural language processing, leading to significant advancements. In this paper, we introduce AI into the field of fruit quality detection. We designed a regression model for fruit sugar level estimation, utilizing an Artificial Neural Network (ANN) based on the visible/near-infrared (V/NIR) spectra of fruits. After analyzing the fruit spectra, we proposed an innovative neural network structure: the lower layers consist of a Multilayer Perceptron (MLP), a middle layer features a 2-dimensional correlation matrix, and the upper layers contain several Convolutional Neural Network (CNN) layers. Using fruit sugar levels as the detection target, we collected data from two fruit types, Gan Nan Navel and Tian Shan Pear, and conducted separate experiments to compare their results. To assess the reliability of our dataset, we first applied Analysis of Variance (ANOVA). We then explored various strategies for processing spectral data and evaluated their impact. Additionally, we employed Wavelet Decomposition (WD) for dimensionality reduction and a Genetic Algorithm (GA) to identify optimal features. We compared the performance of Neural Network models with traditional Partial Least Squares (PLS) models, and specifically evaluated our proposed MLP-CNN structure against other traditional neural network architectures. Finally, we introduced a novel evaluation metric based on the dataset's standard deviation (STD) to assess detection performance, demonstrating the feasibility of using an artificial neural network model for nondestructive fruit sugar level detection.

SPDec 4, 2023

On the Trade-Off between Stability and Representational Capacity in Graph Neural Networks

Zhan Gao, Amanda Prorok, Elvin Isufi

Analyzing the stability of graph neural networks (GNNs) under topological perturbations is key to understanding their transferability and the role of each architecture component. However, stability has been investigated only for particular architectures, questioning whether it holds for a broader spectrum of GNNs or only for a few instances. To answer this question, we study the stability of EdgeNet: a general GNN framework that unifies more than twenty solutions including the convolutional and attention-based classes, as well as graph isomorphism networks and hybrid architectures. We prove that all GNNs within the EdgeNet framework are stable to topological perturbations. By studying the effect of different EdgeNet categories on the stability, we show that GNNs with fewer degrees of freedom in their parameter space, linked to a lower representational capacity, are more stable. The key factor yielding this trade-off is the eigenvector misalignment between the EdgeNet parameter matrices and the graph shift operator. For example, graph convolutional neural networks that assign a single scalar per signal shift (hence, with a perfect alignment) are more stable than the more involved node or edge-varying counterparts. Extensive numerical results corroborate our theoretical findings and highlight the role of different architecture components in the trade-off.

ROApr 8

Differentiable Environment-Trajectory Co-Optimization for Safe Multi-Agent Navigation

Zhan Gao, Gabriele Fadini, Stelian Coros et al.

The environment plays a critical role in multi-agent navigation by imposing spatial constraints, rules, and limitations that agents must navigate around. Traditional approaches treat the environment as fixed, without exploring its impact on agents' performance. This work considers environment configurations as decision variables, alongside agent actions, to jointly achieve safe navigation. We formulate a bi-level problem, where the lower-level sub-problem optimizes agent trajectories that minimize navigation cost and the upper-level sub-problem optimizes environment configurations that maximize navigation safety. We develop a differentiable optimization method that iteratively solves the lower-level sub-problem with interior point methods and the upper-level sub-problem with gradient ascent. A key challenge lies in analytically coupling these two levels. We address this by leveraging KKT conditions and the Implicit Function Theorem to compute gradients of agent trajectories w.r.t. environment parameters, enabling differentiation throughout the bi-level structure. Moreover, we propose a novel metric that quantifies navigation safety as a criterion for the upper-level environment optimization, and prove its validity through measure theory. Our experiments validate the effectiveness of the proposed framework in a variety of safety-critical navigation scenarios, inspired from warehouse logistics to urban transportation. The results demonstrate that optimized environments provide navigation guidance, improving both agents' safety and efficiency.

ROJul 25, 2025

ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination

Michael Amir, Guang Yang, Zhan Gao et al.

Constraint-based optimization is a cornerstone of robotics, enabling the design of controllers that reliably encode task and safety requirements such as collision avoidance or formation adherence. However, handcrafted constraints can fail in multi-agent settings that demand complex coordination. We introduce ReCoDe--Reinforcement-based Constraint Design--a decentralized, hybrid framework that merges the reliability of optimization-based controllers with the adaptability of multi-agent reinforcement learning. Rather than discarding expert controllers, ReCoDe improves them by learning additional, dynamic constraints that capture subtler behaviors, for example, by constraining agent movements to prevent congestion in cluttered scenarios. Through local communication, agents collectively constrain their allowed actions to coordinate more effectively under changing conditions. In this work, we focus on applications of ReCoDe to multi-agent navigation tasks requiring intricate, context-based movements and consensus, where we show that it outperforms purely handcrafted controllers, other hybrid approaches, and standard MARL baselines. We give empirical (real robot) and theoretical evidence that retaining a user-defined controller, even when it is imperfect, is more efficient than learning from scratch, especially because ReCoDe can dynamically change the degree to which it relies on this controller.

CLMay 19, 2025

Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice

Zhi Liu, Tao Yang, Jing Wang et al.

Natural medicines, particularly Traditional Chinese Medicine (TCM), are gaining global recognition for their therapeutic potential in addressing human symptoms and diseases. TCM, with its systematic theories and extensive practical experience, provides abundant resources for healthcare. However, the effective application of TCM requires precise syndrome diagnosis, determination of treatment principles, and prescription formulation, which demand decades of clinical expertise. Despite advancements in TCM-based decision systems, machine learning, and deep learning research, limitations in data and single-objective constraints hinder their practical application. In recent years, large language models (LLMs) have demonstrated potential in complex tasks, but lack specialization in TCM and face significant challenges, such as too big model scale to deploy and issues with hallucination. To address these challenges, we introduce Tianyi with 7.6-billion-parameter LLM, a model scale proper and specifically designed for TCM, pre-trained and fine-tuned on diverse TCM corpora, including classical texts, expert treatises, clinical records, and knowledge graphs. Tianyi is designed to assimilate interconnected and systematic TCM knowledge through a progressive learning manner. Additionally, we establish TCMEval, a comprehensive evaluation benchmark, to assess LLMs in TCM examinations, clinical tasks, domain-specific question-answering, and real-world trials. The extensive evaluations demonstrate the significant potential of Tianyi as an AI assistant in TCM clinical practice and research, bridging the gap between TCM knowledge and practical application.

ROMar 21, 2024

Co-Optimizing Reconfigurable Environments and Policies for Decentralized Multi-Agent Navigation

Zhan Gao, Guang Yang, Amanda Prorok

This work views the multi-agent system and its surrounding environment as a co-evolving system, where the behavior of one affects the other. The goal is to take both agent actions and environment configurations as decision variables, and optimize these two components in a coordinated manner to improve some measure of interest. Towards this end, we consider the problem of decentralized multi-agent navigation in a cluttered environment, where we assume that the layout of the environment is reconfigurable. By introducing two sub-objectives -- multi-agent navigation and environment optimization -- we propose an agent-environment co-optimization problem and develop a coordinated algorithm that alternates between these sub-objectives to search for an optimal synthesis of agent actions and environment configurations; ultimately, improving the navigation performance. Due to the challenge of explicitly modeling the relation between the agents, the environment and their performance therein, we leverage policy gradient to formulate a model-free learning mechanism within the coordinated framework. A formal convergence analysis shows that our coordinated algorithm tracks the local minimum solution of an associated time-varying non-convex optimization problem. Experiments corroborate theoretical findings and show the benefits of co-optimization. Interestingly, the results also indicate that optimized environments can offer structural guidance to de-conflict agents in motion.

LGNov 23, 2025

Hierarchical Dual-Strategy Unlearning for Biomedical and Healthcare Intelligence Using Imperfect and Privacy-Sensitive Medical Data

Yi Zhang, Tianxiang Xu, Zijian Li et al.

Large language models (LLMs) exhibit exceptional performance but pose substantial privacy risks due to training data memorization, particularly within healthcare contexts involving imperfect or privacy-sensitive patient information. We present a hierarchical dual-strategy framework for selective knowledge unlearning that precisely removes specialized knowledge while preserving fundamental medical competencies. Our approach synergistically integrates geometric-constrained gradient updates to selectively modulate target parameters with concept-aware token-level interventions that distinguish between preservation-critical and unlearning-targeted tokens via a unified four-level medical concept hierarchy. Comprehensive evaluations on the MedMCQA (surgical) and MHQA (anxiety, depression, trauma) datasets demonstrate superior performance, achieving an 82.7% forgetting rate and 88.5% knowledge preservation. Notably, our framework maintains robust privacy guarantees while requiring modification of only 0.1% of parameters, addressing critical needs for regulatory compliance, auditability, and ethical standards in clinical research.

SYMay 18, 2023

Constrained Environment Optimization for Prioritized Multi-Agent Navigation

Zhan Gao, Amanda Prorok

Traditional approaches to the design of multi-agent navigation algorithms consider the environment as a fixed constraint, despite the influence of spatial constraints on agents' performance. Yet hand-designing conducive environment layouts is inefficient and potentially expensive. The goal of this paper is to consider the environment as a decision variable in a system-level optimization problem, where both agent performance and environment cost are incorporated. Towards this end, we propose novel problems of unprioritized and prioritized environment optimization, where the former considers agents unbiasedly and the latter accounts for agent priorities. We show, through formal proofs, under which conditions the environment can change while guaranteeing completeness (i.e., all agents reach goals), and analyze the role of agent priorities in the environment optimization. We proceed to impose real-world constraints on the environment optimization and formulate it mathematically as a constrained stochastic optimization problem. Since the relation between agents, environment and performance is challenging to model, we leverage reinforcement learning to develop a model-free solution and a primal-dual mechanism to handle constraints. Distinct information processing architectures are integrated for various implementation scenarios, including online/offline optimization and discrete/continuous environment. Numerical results corroborate the theory and demonstrate the validity and adaptability of our approach.

SPJan 29, 2022

Learning Stochastic Graph Neural Networks with Constrained Variance

Zhan Gao, Elvin Isufi

Stochastic graph neural networks (SGNNs) are information processing architectures that learn representations from data over random graphs. SGNNs are trained with respect to the expected performance, which comes with no guarantee about deviations of particular output realizations around the optimal expectation. To overcome this issue, we propose a variance-constrained optimization problem for SGNNs, balancing the expected performance and the stochastic deviation. An alternating primal-dual learning procedure is undertaken that solves the problem by updating the SGNN parameters with gradient descent and the dual variable with gradient ascent. To characterize the explicit effect of the variance-constrained learning, we conduct a theoretical analysis on the variance of the SGNN output and identify a trade-off between the stochastic robustness and the discrimination power. We further analyze the duality gap of the variance-constrained optimization problem and the converging behavior of the primal-dual learning procedure. The former indicates the optimality loss induced by the dual transformation and the latter characterizes the limiting error of the iterative algorithm, both of which guarantee the performance of the variance-constrained learning. Through numerical simulations, we corroborate our theoretical findings and observe a strong expected performance with a controllable standard deviation.

LGJul 19, 2021

Wide and Deep Graph Neural Network with Distributed Online Learning

Zhan Gao, Fernando Gama, Alejandro Ribeiro

Graph neural networks (GNNs) are naturally distributed architectures for learning representations from network data. This renders them suitable candidates for decentralized tasks. In these scenarios, the underlying graph often changes with time due to link failures or topology variations, creating a mismatch between the graphs on which GNNs were trained and the ones on which they are tested. Online learning can be leveraged to retrain GNNs at testing time to overcome this issue. However, most online algorithms are centralized and usually offer guarantees only on convex problems, which GNNs rarely lead to. This paper develops the Wide and Deep GNN (WD-GNN), a novel architecture that can be updated with distributed online learning mechanisms. The WD-GNN consists of two components: the wide part is a linear graph filter and the deep part is a nonlinear GNN. At training time, the joint wide and deep architecture learns nonlinear representations from data. At testing time, the wide, linear part is retrained, while the deep, nonlinear one remains fixed. This often leads to a convex formulation. We further propose a distributed online learning algorithm that can be implemented in a decentralized setting. We also show the stability of the WD-GNN to changes of the underlying graph and analyze the convergence of the proposed online learning procedure. Experiments on movie recommendation, source localization and robot swarm control corroborate theoretical findings and show the potential of the WD-GNN for distributed online learning.

LGJun 19, 2021

Stability of Graph Convolutional Neural Networks to Stochastic Perturbations

Zhan Gao, Elvin Isufi, Alejandro Ribeiro

Graph convolutional neural networks (GCNNs) are nonlinear processing tools to learn representations from network data. A key property of GCNNs is their stability to graph perturbations. Current analysis considers deterministic perturbations but fails to provide relevant insights when topological changes are random. This paper investigates the stability of GCNNs to stochastic graph perturbations induced by link losses. In particular, it proves the expected output difference between the GCNN over random perturbed graphs and the GCNN over the nominal graph is upper bounded by a factor that is linear in the link loss probability. We perform the stability analysis in the graph spectral domain such that the result holds uniformly for any graph. This result also shows the role of the nonlinearity and the architecture width and depth, and allows identifying handle to improve the GCNN robustness. Numerical simulations on source localization and robot swarm control corroborate our theoretical findings.

LGJun 5, 2021

Training Robust Graph Neural Networks with Topology Adaptive Edge Dropping

Zhan Gao, Subhrajit Bhattacharya, Leiming Zhang et al.

Graph neural networks (GNNs) are processing architectures that exploit graph structural information to model representations from network data. Despite their success, GNNs suffer from sub-optimal generalization performance given limited training data, referred to as over-fitting. This paper proposes Topology Adaptive Edge Dropping (TADropEdge) method as an adaptive data augmentation technique to improve generalization performance and learn robust GNN models. We start by explicitly analyzing how random edge dropping increases the data diversity during training, while indicating i.i.d. edge dropping does not account for graph structural information and could result in noisy augmented data degrading performance. To overcome this issue, we consider graph connectivity as the key property that captures graph topology. TADropEdge incorporates this factor into random edge dropping such that the edge-dropped subgraphs maintain similar topology as the underlying graph, yielding more satisfactory data augmentation. In particular, TADropEdge first leverages the graph spectrum to assign proper weights to graph edges, which represent their criticality for establishing the graph connectivity. It then normalizes the edge weights and drops graph edges adaptively based on their normalized weights. Besides improving generalization performance, TADropEdge reduces variance for efficient training and can be applied as a generic method modular to different GNN models. Intensive experiments on real-life and synthetic datasets corroborate theory and verify the effectiveness of the proposed method.

LGOct 12, 2020

Spherical Convolutional Neural Networks: Stability to Perturbations in SO(3)

Zhan Gao, Fernando Gama, Alejandro Ribeiro

Spherical convolutional neural networks (Spherical CNNs) learn nonlinear representations from 3D data by exploiting the data structure and have shown promising performance in shape analysis, object classification, and planning among others. This paper investigates the properties that Spherical CNNs exhibit as they pertain to the rotational structure inherent in spherical signals. We build upon the rotation equivariance of spherical convolutions to show that Spherical CNNs are stable to general structure perturbations. In particular, we model arbitrary structure perturbations as diffeomorphism perturbations, and define the rotation distance that measures how far from rotations these perturbations are. We prove that the output change of a Spherical CNN induced by the diffeomorphism perturbation is bounded proportionally by the perturbation size under the rotation distance. This stability property coupled with the rotation equivariance provide theoretical guarantees that underpin the practical observations that Spherical CNNs exploit the rotational structure, maintain performance under structure perturbations that are close to rotations, and offer good generalization and faster learning.

SPJul 27, 2020

Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications

Zhan Gao, Mark Eisen, Alejandro Ribeiro

This paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. The resource allocation problem is modeled as the constrained stochastic optimization framework, which covers a variety of FSO scenarios involving power adaptation, relay selection and their joint allocation. Under this framework, we propose two algorithms that solve FSO resource allocation problems. We first present the Stochastic Dual Gradient (SDG) algorithm that is shown to solve the problem exactly by exploiting the strong duality but whose implementation necessarily requires explicit and accurate system models. As an alternative we present the Primal-Dual Deep Learning (PDDL) algorithm based on the SDG algorithm, which parametrizes the resource allocation policy with Deep Neural Networks (DNNs) and optimizes the latter via a primal-dual method. The parametrized resource allocation problem incurs only a small loss of optimality due to the strong representational power of DNNs, and can be moreover implemented without knowledge of system models. A wide set of numerical experiments are performed to corroborate the proposed algorithms in FSO resource allocation problems. We demonstrate their superior performance and computational efficiency compared to the baseline methods in both continuous power allocation and binary relay selection settings.

SPJul 2, 2020

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

Zhan Gao, Alec Koppel, Alejandro Ribeiro

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is required for exact asymptotic convergence with the fact that constant step-size learns faster in finite time up to an error. To do so, rather than fixing the mini-batch and the step-size at the outset, we propose a strategy to allow parameters to evolve adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is developed for both convex and non-convex stochastic optimization problems. It inherits the exact asymptotic convergence of stochastic gradient method. More importantly, the optimal error decreasing rate is achieved theoretically, as well as an overall reduction in computational cost. Experimentally, we observe that TSA attains a favorable tradeoff relative to standard SGD that fixes the mini-batch and the step-size, or simply allowing one to increase or decrease respectively.

SPJun 26, 2020

Resource Allocation via Graph Neural Networks in Free Space Optical Fronthaul Networks

Zhan Gao, Mark Eisen, Alejandro Ribeiro

This paper investigates the optimal resource allocation in free space optical (FSO) fronthaul networks. The optimal allocation maximizes an average weighted sum-capacity subject to power limitation and data congestion constraints. Both adaptive power assignment and node selection are considered based on the instantaneous channel state information (CSI) of the links. By parameterizing the resource allocation policy, we formulate the problem as an unsupervised statistical learning problem. We consider the graph neural network (GNN) for the policy parameterization to exploit the FSO network structure with small-scale training parameters. The GNN is shown to retain the permutation equivariance that matches with the permutation equivariance of resource allocation policy in networks. The primal-dual learning algorithm is developed to train the GNN in a model-free manner, where the knowledge of system models is not required. Numerical simulations present the strong performance of the GNN relative to a baseline policy with equal power assignment and random node selection.

LGJun 11, 2020

Wide and Deep Graph Neural Networks with Distributed Online Learning

Zhan Gao, Fernando Gama, Alejandro Ribeiro

Graph neural networks (GNNs) learn representations from network data with naturally distributed architectures, rendering them well-suited candidates for decentralized learning. Oftentimes, this decentralized graph support changes with time due to link failures or topology variations. These changes create a mismatch between the graphs on which GNNs were trained and the ones on which they are tested. Online learning can be used to retrain GNNs at testing time, overcoming this issue. However, most online algorithms are centralized and work on convex problems (which GNNs rarely lead to). This paper proposes the Wide and Deep GNN (WD-GNN), a novel architecture that can be easily updated with distributed online learning mechanisms. The WD-GNN comprises two components: the wide part is a bank of linear graph filters and the deep part is a GNN. At training time, the joint architecture learns a nonlinear representation from data. At testing time, the deep part (nonlinear) is left unchanged, while the wide part is retrained online, leading to a convex problem. We derive convergence guarantees for this online retraining procedure and further propose a decentralized alternative. Experiments on the robot swarm control for flocking corroborate theory and show potential of the proposed architecture for distributed online learning.

SPJun 4, 2020

Stochastic Graph Neural Networks

Zhan Gao, Elvin Isufi, Alejandro Ribeiro

Graph neural networks (GNNs) model nonlinear representations in graph data with applications in distributed agent coordination, control, and planning among others. Current GNN architectures assume ideal scenarios and ignore link fluctuations that occur due to environment, human factors, or external attacks. In these situations, the GNN fails to address its distributed task if the topological randomness is not considered accordingly. To overcome this issue, we put forth the stochastic graph neural network (SGNN) model: a GNN where the distributed graph convolution module accounts for the random network changes. Since stochasticity brings in a new learning paradigm, we conduct a statistical analysis on the SGNN output variance to identify conditions the learned filters should satisfy for achieving robust transference to perturbed scenarios, ultimately revealing the explicit impact of random link losses. We further develop a stochastic gradient descent (SGD) based learning process for the SGNN and derive conditions on the learning rate under which this learning process converges to a stationary point. Numerical results corroborate our theoretical findings and compare the benefits of SGNN robust transference with a conventional GNN that ignores graph perturbations during learning.

SPJun 21, 2019

Optimal WDM Power Allocation via Deep Learning for Radio on Free Space Optics Systems

Zhan Gao, Mark Eisen, Alejandro Ribeiro

Radio on Free Space Optics (RoFSO), as a universal platform for heterogeneous wireless services, is able to transmit multiple radio frequency signals at high rates in free space optical networks. This paper investigates the optimal design of power allocation for Wavelength Division Multiplexing (WDM) transmission in RoFSO systems. The proposed problem is a weighted total capacity maximization problem with two constraints of total power limitation and eye safety concern. The model-based Stochastic Dual Gradient algorithm is presented first, which solves the problem exactly by exploiting the null duality gap. The model-free Primal-Dual Deep Learning algorithm is then developed to learn and optimize the power allocation policy with Deep Neural Network (DNN) parametrization, which can be utilized without any knowledge of system models. Numerical simulations are performed to exhibit significant performance of our algorithms compared to the average equal power allocation.