OCMay 26, 2013
Robust Energy Management for Microgrids With High-Penetration RenewablesYu Zhang, Nikolaos Gatsis, Georgios B. Giannakis
Due to its reduced communication overhead and robustness to failures, distributed energy management is of paramount importance in smart grids, especially in microgrids, which feature distributed generation (DG) and distributed storage (DS). Distributed economic dispatch for a microgrid with high renewable energy penetration and demand-side management operating in grid-connected mode is considered in this paper. To address the intrinsically stochastic availability of renewable energy sources (RES), a novel power scheduling approach is introduced. The approach involves the actual renewable energy as well as the energy traded with the main grid, so that the supply-demand balance is maintained. The optimal scheduling strategy minimizes the microgrid net cost, which includes DG and DS costs, utility of dispatchable loads, and worst-case transaction cost stemming from the uncertainty in RES. Leveraging the dual decomposition, the optimization problem formulated is solved in a distributed fashion by the local controllers of DG, DS, and dispatchable loads. Numerical results are reported to corroborate the effectiveness of the novel approach.
OCJan 25, 2014
Distributed Optimal Power Flow for Smart MicrogridsEmiliano Dall'Anese, Hao Zhu, Georgios B. Giannakis
Optimal power flow (OPF) is considered for microgrids, with the objective of minimizing either the power distribution losses, or, the cost of power drawn from the substation and supplied by distributed generation (DG) units, while effecting voltage regulation. The microgrid is unbalanced, due to unequal loads in each phase and non-equilateral conductor spacings on the distribution lines. Similar to OPF formulations for balanced systems, the considered OPF problem is nonconvex. Nevertheless, a semidefinite programming (SDP) relaxation technique is advocated to obtain a convex problem solvable in polynomial-time complexity. Enticingly, numerical tests demonstrate the ability of the proposed method to attain the globally optimal solution of the original nonconvex OPF. To ensure scalability with respect to the number of nodes, robustness to isolated communication outages, and data privacy and integrity, the proposed SDP is solved in a distributed fashion by resorting to the alternating direction method of multipliers. The resulting algorithm entails iterative message-passing among groups of consumers and guarantees faster convergence compared to competing alternatives
SYDec 3, 2019
Two-Timescale Voltage Control in Distribution Grids Using Deep Reinforcement LearningQiuling Yang, Gang Wang, Alireza Sadeghi et al.
Modern distribution grids are currently being challenged by frequent and sizable voltage fluctuations, due mainly to the increasing deployment of electric vehicles and renewable generators. Existing approaches to maintaining bus voltage magnitudes within the desired region can cope with either traditional utility-owned devices (e.g., shunt capacitors), or contemporary smart inverters that come with distributed generation units (e.g., photovoltaic plants). The discrete on-off commitment of capacitor units is often configured on an hourly or daily basis, yet smart inverters can be controlled within milliseconds, thus challenging joint control of these two types of assets. In this context, a novel two-timescale voltage regulation scheme is developed for distribution grids by judiciously coupling data-driven with physicsbased optimization. On a faster timescale, say every second, the optimal setpoints of smart inverters are obtained by minimizing instantaneous bus voltage deviations from their nominal values, based on either the exact alternating current power flow model or a linear approximant of it; whereas, on the slower timescale (e.g., every hour), shunt capacitors are configured to minimize the longterm discounted voltage deviations using a deep reinforcement learning algorithm. Extensive numerical tests on a real-world 47- bus distribution network as well as the IEEE 123-bus test feeder using real data corroborate the effectiveness of the novel scheme.
NISep 20, 2011
Distributed Recursive Least-Squares: Stability and Performance AnalysisGonzalo Mateos, Georgios B. Giannakis
The recursive least-squares (RLS) algorithm has well-documented merits for reducing complexity and storage requirements, when it comes to online estimation of stationary signals as well as for tracking slowly-varying nonstationary processes. In this paper, a distributed recursive least-squares (D-RLS) algorithm is developed for cooperative estimation using ad hoc wireless sensor networks. Distributed iterations are obtained by minimizing a separable reformulation of the exponentially-weighted least-squares cost, using the alternating-minimization algorithm. Sensors carry out reduced-complexity tasks locally, and exchange messages with one-hop neighbors to consent on the network-wide estimates adaptively. A steady-state mean-square error (MSE) performance analysis of D-RLS is conducted, by studying a stochastically-driven `averaged' system that approximates the D-RLS dynamics asymptotically in time. For sensor observations that are linearly related to the time-invariant parameter vector sought, the simplifying independence setting assumptions facilitate deriving accurate closed-form expressions for the MSE steady-state values. The problems of mean- and MSE-sense stability of D-RLS are also investigated, and easily-checkable sufficient conditions are derived under which a steady-state is attained. Without resorting to diminishing step-sizes which compromise the tracking ability of D-RLS, stability ensures that per sensor estimates hover inside a ball of finite radius centered at the true parameter vector, with high-probability, even when inter-sensor communication links are noisy. Interestingly, computer simulations demonstrate that the theoretical findings are accurate also in the pragmatic settings whereby sensors acquire temporally-correlated data.
SYApr 28, 2011
Doubly Robust Smoothing of Dynamical Processes via Outlier Sparsity ConstraintsShahrokh Farahmand, Georgios B. Giannakis, Daniele Angelosante
Coping with outliers contaminating dynamical processes is of major importance in various applications because mismatches from nominal models are not uncommon in practice. In this context, the present paper develops novel fixed-lag and fixed-interval smoothing algorithms that are robust to outliers simultaneously present in the measurements {\it and} in the state dynamics. Outliers are handled through auxiliary unknown variables that are jointly estimated along with the state based on the least-squares criterion that is regularized with the $\ell_1$-norm of the outliers in order to effect sparsity control. The resultant iterative estimators rely on coordinate descent and the alternating direction method of multipliers, are expressed in closed form per iteration, and are provably convergent. Additional attractive features of the novel doubly robust smoother include: i) ability to handle both types of outliers; ii) universality to unknown nominal noise and outlier distributions; iii) flexibility to encompass maximum a posteriori optimal estimators with reliable performance under nominal conditions; and iv) improved performance relative to competing alternatives at comparable complexity, as corroborated via simulated tests.
SYOct 27, 2018
Learning and Management for Internet-of-Things: Accounting for Adaptivity and ScalabilityTianyi Chen, Sergio Barbarossa, Xin Wang et al.
Internet-of-Things (IoT) envisions an intelligent infrastructure of networked smart devices offering task-specific monitoring and control services. The unique features of IoT include extreme heterogeneity, massive number of devices, and unpredictable dynamics partially due to human interaction. These call for foundational innovations in network design and management. Ideally, it should allow efficient adaptation to changing environments, and low-cost implementation scalable to massive number of devices, subject to stringent latency constraints. To this end, the overarching goal of this paper is to outline a unified framework for online learning and management policies in IoT through joint advances in communication, networking, learning, and optimization. From the network architecture vantage point, the unified framework leverages a promising fog architecture that enables smart devices to have proximity access to cloud functionalities at the network edge, along the cloud-to-things continuum. From the algorithmic perspective, key innovations target online approaches adaptive to different degrees of nonstationarity in IoT dynamics, and their scalable model-free implementation under limited feedback that motivates blind or bandit approaches. The proposed framework aspires to offer a stepping stone that leads to systematic designs and analysis of task-specific learning and management schemes for IoT, along with a host of new research directions to build on.
SYFeb 6, 2019
Robust and Scalable Power System State Estimation via Composite OptimizationGang Wang, Georgios B. Giannakis, Jie Chen
In today's cyber-enabled smart grids, high penetration of uncertain renewables, purposeful manipulation of meter readings, and the need for wide-area situational awareness, call for fast, accurate, and robust power system state estimation. The least-absolute-value (LAV) estimator is known for its robustness relative to the weighted least-squares (WLS) one. However, due to nonconvexity and nonsmoothness, existing LAV solvers based on linear programming are typically slow, hence inadequate for real-time system monitoring. This paper develops two novel algorithms for efficient LAV estimation, which draw from recent advances in composite optimization. The first is a deterministic linear proximal scheme that handles a sequence of convex quadratic problems, each efficiently solvable either via off-the-shelf algorithms or through the alternating direction method of multipliers. Leveraging the sparse connectivity inherent to power networks, the second scheme is stochastic, and updates only \emph{a few} entries of the complex voltage state vector per iteration. In particular, when voltage magnitude and (re)active power flow measurements are used only, this number reduces to one or two, \emph{regardless of} the number of buses in the network. This computational complexity evidently scales well to large-size power systems. Furthermore, by carefully \emph{mini-batching} the voltage and power flow measurements, accelerated implementation of the stochastic iterations becomes possible. The developed algorithms are numerically evaluated using a variety of benchmark power networks. Simulated tests corroborate that improved robustness can be attained at comparable or markedly reduced computation times for medium- or large-size networks relative to the "workhorse" WLS-based Gauss-Newton iterations.
SYFeb 1, 2016
Ergodic Energy Management Leveraging Resource Variability in Distribution GridsGang Wang, Vassilis Kekatos, Antonio J. Conejo et al.
Contemporary electricity distribution systems are being challenged by the variability of renewable energy sources. Slow response times and long energy management periods cannot efficiently integrate intermittent renewable generation and demand. Yet stochasticity can be judiciously coupled with system flexibilities to enhance grid operation efficiency. Voltage magnitudes for instance can transiently exceed regulation limits, while smart inverters can be overloaded over short time intervals. To implement such a mode of operation, an ergodic energy management framework is developed here. Considering a distribution grid with distributed energy sources and a feed-in tariff program, active power curtailment and reactive power compensation are formulated as a stochastic optimization problem. Tighter operational constraints are enforced in an average sense, while looser margins are enforced to be satisfied at all times. Stochastic dual subgradient solvers are developed based on exact and approximate grid models of varying complexity. Numerical tests on a real-world 56-bus distribution grid and the IEEE 123-bus test feeder relying on both grid models corroborate the advantages of the novel schemes over their deterministic alternatives.
MLMay 27, 2022
Surrogate modeling for Bayesian optimization beyond a single Gaussian processQin Lu, Konstantinos D. Polyzos, Bingcong Li et al.
Bayesian optimization (BO) has well-documented merits for optimizing black-box functions with an expensive evaluation cost. Such functions emerge in applications as diverse as hyperparameter tuning, drug discovery, and robotics. BO hinges on a Bayesian surrogate model to sequentially select query points so as to balance exploration with exploitation of the search space. Most existing works rely on a single Gaussian process (GP) based surrogate model, where the kernel function form is typically preselected using domain knowledge. To bypass such a design process, this paper leverages an ensemble (E) of GPs to adaptively select the surrogate model fit on-the-fly, yielding a GP mixture posterior with enhanced expressiveness for the sought function. Acquisition of the next evaluation input using this EGP-based function posterior is then enabled by Thompson sampling (TS) that requires no additional design parameters. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. The novel EGP-TS readily accommodates parallel operation. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret for both sequential and parallel settings. Tests on synthetic functions and real-world applications showcase the merits of the proposed method.
OCJan 31, 2013
Load curve data cleansing and imputation via sparsity and low rankGonzalo Mateos, Georgios B. Giannakis
The smart grid vision is to build an intelligent power network with an unprecedented level of situational awareness and controllability over its services and infrastructure. This paper advocates statistical inference methods to robustify power monitoring tasks against the outlier effects owing to faulty readings and malicious attacks, as well as against missing data due to privacy concerns and communication errors. In this context, a novel load cleansing and imputation scheme is developed leveraging the low intrinsic-dimensionality of spatiotemporal load profiles and the sparse nature of "bad data.'' A robust estimator based on principal components pursuit (PCP) is adopted, which effects a twofold sparsity-promoting regularization through an $\ell_1$-norm of the outliers, and the nuclear norm of the nominal load profiles. Upon recasting the non-separable nuclear norm into a form amenable to decentralized optimization, a distributed (D-) PCP algorithm is developed to carry out the imputation and cleansing tasks using networked devices comprising the so-termed advanced metering infrastructure. If D-PCP converges and a qualification inequality is satisfied, the novel distributed estimator provably attains the performance of its centralized PCP counterpart, which has access to all networkwide data. Computer simulations and tests with real load curve data corroborate the convergence and effectiveness of the novel D-PCP algorithm.
SYApr 19, 2011
Estimating the State of AC Power Systems using Semidefinite ProgrammingHao Zhu, Georgios B. Giannakis
This paper has been withdrawn by the authors
LGSep 27, 2023
Enhancing Sharpness-Aware Optimization Through Variance SuppressionBingcong Li, Georgios B. Giannakis
Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.
SYMay 11, 2017
Power System State Estimation via Feasible Point Pursuit: Algorithms and Cramer-Rao BoundGang Wang, Ahmed S. Zamzam, Georgios B. Giannakis et al.
Accurately monitoring the system's operating point is central to the reliable and economic operation of an electric power grid. Power system state estimation (PSSE) aims to obtain complete voltage magnitude and angle information at each bus given a number of system variables at selected buses and lines. Power flow analysis is a special case of PSSE, and amounts to solving a set of noise-free power flow equations. Physical laws dictate quadratic relationships between available quantities and unknown voltages, rendering general instances of power flow and PSSE nonconvex and NP-hard. Past approaches are largely based on gradient-type iterative procedures or semidefinite relaxation (SDR). Due to nonconvexity, the solution obtained via gradient-type schemes depends on initialization, while SDR methods do not perform as desired in challenging scenarios. This paper puts forth novel \emph{feasible point pursuit} (FPP)-based solvers for power flow and PSSE, which iteratively seek feasible solutions for a nonconvex quadratically constrained quadratic programming (QCQP) reformulation of the weighted least-squares (WLS) problem. Relative to the prior art, the developed solvers offer superior performance at the cost of higher complexity. Furthermore, they converge to a stationary point of the WLS problem. As a baseline for comparing different estimators, the Cram{\' e}r-Rao lower bound (CRLB) is derived for the fundamental PSSE problem in this paper. Judicious numerical tests on several IEEE benchmark systems showcase markedly improved performance of our FPP-based solvers for both power flow and PSSE tasks over popular WLS-based Gauss-Newton iterations and SDR approaches.
OCJan 31, 2014
Sparsity-leveraging Reconfiguration of Smart Distribution SystemsEmiliano Dall'Anese, Georgios B. Giannakis
A system reconfiguration problem is considered for three-phase power distribution networks featuring distributed generation. In lieu of binary line selection variables, the notion of group sparsity is advocated to re-formulate the nonconvex distribution system reconfiguration (DSR) problem into a convex one. Using the duality theory, it is shown that the line selection task boils down to a shrinkage and thresholding operation on the line currents. Further, numerical tests illustrate the ability of the proposed scheme to identify meshed, weakly-meshed, or even radial configurations by adjusting a sparsity-tuning parameter in the DSR cost. Constraints on the voltages are investigated, and incorporated in the novel DSR problem to effect voltage regulation.
APJun 21, 2012
Robust Power System State Estimation for the Nonlinear AC Flow ModelHao Zhu, Georgios B. Giannakis
An important monitoring task for power systems is accurate estimation of the system operation state. Under the nonlinear AC power flow model, the state estimation (SE) problem is inherently nonconvex giving rise to many local optima. In addition to nonconvexity, SE is challenged by data integrity and cyber-security issues. Unfortunately, existing robust (R-) SE schemes employed routinely in practice rely on iterative solvers, which are sensitive to initialization and cannot ensure global optimality. A novel R-SE approach is formulated here by capitalizing on the sparsity of an overcomplete outlier vector model. Observability and identifiability issues of this model are investigated, and neat links are established between R-SE and error control coding. The \emph{convex} semidefinite relaxation (SDR) technique is further pursued to render the nonconvex R-SE problem efficiently solvable. The resultant algorithm markedly outperforms existing iterative alternatives, as corroborated through numerical tests on the standard IEEE 30-bus system.
SYApr 19, 2018
Multi-Timescale Online Optimization of Network Function Virtualization for Service ChainingXiaojing Chen, Wei Ni, Tianyi Chen et al.
Network Function Virtualization (NFV) can cost-efficiently provide network services by running different virtual network functions (VNFs) at different virtual machines (VMs) in a correct order. This can result in strong couplings between the decisions of the VMs on the placement and operations of VNFs. This paper presents a new fully decentralized online approach for optimal placement and operations of VNFs. Building on a new stochastic dual gradient method, our approach decouples the real-time decisions of VMs, asymptotically minimizes the time-average cost of NFV, and stabilizes the backlogs of network services with a cost-backlog tradeoff of $[ε,1/ε]$, for any $ε> 0$. Our approach can be relaxed into multiple timescales to have VNFs (re)placed at a larger timescale and hence alleviate service interruptions. While proved to preserve the asymptotic optimality, the larger timescale can slow down the optimal placement of VNFs. A learn-and-adapt strategy is further designed to speed the placement up with an improved tradeoff $[ε,\log^2(ε)/{\sqrtε}]$. Numerical results show that the proposed method is able to reduce the time-average cost of NFV by 30\% and reduce the queue length (or delay) by 83\%, as compared to existing benchmarks.
SYOct 31, 2017
Learn-and-Adapt Stochastic Dual Gradients for Network Resource AllocationTianyi Chen, Qing Ling, Georgios B. Giannakis
Network resource allocation shows revived popularity in the era of data deluge and information explosion. Existing stochastic optimization approaches fall short in attaining a desirable cost-delay tradeoff. Recognizing the central role of Lagrange multipliers in network resource allocation, a novel learn-and-adapt stochastic dual gradient (LA-SDG) method is developed in this paper to learn the sample-optimal Lagrange multiplier from historical data, and accordingly adapt the upcoming resource allocation strategy. Remarkably, LA-SDG only requires just an extra sample (gradient) evaluation relative to the celebrated stochastic dual gradient (SDG) method. LA-SDG can be interpreted as a foresighted learning scheme with an eye on the future, or, a modified heavy-ball iteration from an optimization viewpoint. It is established - both theoretically and empirically - that LA-SDG markedly improves the cost-delay tradeoff over state-of-the-art allocation schemes.
SYJan 13, 2018
Decentralized RLS with Data-Adaptive Censoring for Regressions over Large-Scale NetworksZifeng Wang, Zheng Yu, Qing Ling et al.
The deluge of networked data motivates the development of algorithms for computation- and communication-efficient information processing. In this context, three data-adaptive censoring strategies are introduced to considerably reduce the computation and communication overhead of decentralized recursive least-squares (D-RLS) solvers. The first relies on alternating minimization and the stochastic Newton iteration to minimize a network-wide cost, which discards observations with small innovations. In the resultant algorithm, each node performs local data-adaptive censoring to reduce computations, while exchanging its local estimate with neighbors so as to consent on a network-wide solution. The communication cost is further reduced by the second strategy, which prevents a node from transmitting its local estimate to neighbors when the innovation it induces to incoming data is minimal. In the third strategy, not only transmitting, but also receiving estimates from neighbors is prohibited when data-adaptive censoring is in effect. For all strategies, a simple criterion is provided for selecting the threshold of innovation to reach a prescribed average data reduction. The novel censoring-based (C)D-RLS algorithms are proved convergent to the optimal argument in the mean-square deviation sense. Numerical experiments validate the effectiveness of the proposed algorithms in reducing computation and communication overhead.
SYApr 28, 2011
Tracking Target Signal Strengths on a Grid using SparsityShahrokh Farahmand, Georgios B. Giannakis, Geert Leus et al.
Multi-target tracking is mainly challenged by the nonlinearity present in the measurement equation, and the difficulty in fast and accurate data association. To overcome these challenges, the present paper introduces a grid-based model in which the state captures target signal strengths on a known spatial grid (TSSG). This model leads to \emph{linear} state and measurement equations, which bypass data association and can afford state estimation via sparsity-aware Kalman filtering (KF). Leveraging the grid-induced sparsity of the novel model, two types of sparsity-cognizant TSSG-KF trackers are developed: one effects sparsity through $\ell_1$-norm regularization, and the other invokes sparsity as an extra measurement. Iterative extended KF and Gauss-Newton algorithms are developed for reduced-complexity tracking, along with accurate error covariance updates for assessing performance of the resultant sparsity-aware state estimators. Based on TSSG state estimates, more informative target position and track estimates can be obtained in a follow-up step, ensuring that track association and position estimation errors do not propagate back into TSSG state estimates. The novel TSSG trackers do not require knowing the number of targets or their signal strengths, and exhibit considerably lower complexity than the benchmark hidden Markov model filter, especially for a large number of targets. Numerical simulations demonstrate that sparsity-cognizant trackers enjoy improved root mean-square error performance at reduced complexity when compared to their sparsity-agnostic counterparts.
SYMay 16, 2011
Lassoing Line Outages in the Smart Power GridHao Zhu, Georgios B. Giannakis
Fast and accurate unveiling of power line outages is of paramount importance not only for preventing faults that may lead to blackouts, but also for routine monitoring and control tasks of the smart grid, including state estimation and optimal power flow. Existing approaches are either challenged by the \emph{combinatorial complexity} issues involved, and are thus limited to identifying single- and double-line outages; or, they invoke less pragmatic assumptions such as \emph{conditionally independent} phasor angle measurements available across the grid. Using only a subset of voltage phasor angle data, the present paper develops a near real-time algorithm for identifying multiple line outages at the affordable complexity of solving a quadratic program via block coordinate descent iterations. The novel approach relies on reformulating the DC linear power flow model as a \emph{sparse} overcomplete expansion, and leveraging contemporary advances in compressive sampling and variable selection using the least-absolute shrinkage and selection operator (Lasso). Analysis and simulated tests on the standard IEEE 118-bus system confirm the effectiveness of lassoing line changes in the smart power grid.
LGMar 31, 2023
Scalable Bayesian Meta-Learning through Generalized Implicit GradientsYilang Zhang, Bingcong Li, Shijian Gao et al.
Meta-learning owns unique effectiveness and swiftness in tackling emerging tasks with limited data. Its broad applicability is revealed by viewing it as a bi-level optimization problem. The resultant algorithmic viewpoint however, faces scalability issues when the inner-level optimization relies on gradient-based iterations. Implicit differentiation has been considered to alleviate this challenge, but it is restricted to an isotropic Gaussian prior, and only favors deterministic meta-learning approaches. This work markedly mitigates the scalability bottleneck by cross-fertilizing the benefits of implicit differentiation to probabilistic Bayesian meta-learning. The novel implicit Bayesian meta-learning (iBaML) method not only broadens the scope of learnable priors, but also quantifies the associated uncertainty. Furthermore, the ultimate complexity is well controlled regardless of the inner-level optimization trajectory. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one. Extensive numerical tests are also carried out to empirically validate the performance of the proposed method.
LGJun 10, 2022
Weighted Ensembles for Active Learning with AdaptivityKonstantinos D. Polyzos, Qin Lu, Georgios B. Giannakis
Labeled data can be expensive to acquire in several application domains, including medical imaging, robotics, and computer vision. To efficiently train machine learning models under such high labeling costs, active learning (AL) judiciously selects the most informative data instances to label on-the-fly. This active sampling process can benefit from a statistical function model, that is typically captured by a Gaussian process (GP). While most GP-based AL approaches rely on a single kernel function, the present contribution advocates an ensemble of GP models with weights adapted to the labeled data collected incrementally. Building on this novel EGP model, a suite of acquisition functions emerges based on the uncertainty and disagreement rules. An adaptively weighted ensemble of EGP-based acquisition functions is also introduced to further robustify performance. Extensive tests on synthetic and real datasets showcase the merits of the proposed EGP-based approaches with respect to the single GP-based AL alternatives.
SPJul 9, 2024
Learning From Crowdsourced Noisy Labels: A Signal Processing PerspectiveShahana Ibrahim, Panagiotis A. Traganitis, Xiao Fu et al.
One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data are dispatched to multiple annotators. The annotator-produced labels are then fused to serve downstream learning and inference tasks. This annotation process often creates noisy labels due to various reasons, such as the limited expertise, or unreliability of annotators, among others. Therefore, a core objective in crowdsourcing is to develop methods that effectively mitigate the negative impact of such label noise on learning tasks. This feature article introduces advances in learning from noisy crowdsourced labels. The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learning-based approaches, emphasizing analytical insights and algorithmic developments. In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization, and novel, principled solutions of longstanding challenges in crowdsourcing -- showing how SP perspectives drive the advancements of this field. Furthermore, this article touches upon emerging topics that are critical for developing cutting-edge AI/ML systems, such as crowdsourcing in reinforcement learning with human feedback (RLHF) and direct preference optimization (DPO) that are key techniques for fine-tuning large language models (LLMs).
ROSep 29, 2023
3D Reconstruction in Noisy Agricultural Environments: A Bayesian Optimization Perspective for View PlanningAthanasios Bacharis, Konstantinos D. Polyzos, Henry J. Nelson et al.
3D reconstruction is a fundamental task in robotics that gained attention due to its major impact in a wide variety of practical settings, including agriculture, underwater, and urban environments. This task can be carried out via view planning (VP), which aims to optimally place a certain number of cameras in positions that maximize the visual information, improving the resulting 3D reconstruction. Nonetheless, in most real-world settings, existing environmental noise can significantly affect the performance of 3D reconstruction. To that end, this work advocates a novel geometric-based reconstruction quality function for VP, that accounts for the existing noise of the environment, without requiring its closed-form expression. With no analytic expression of the objective function, this work puts forth an adaptive Bayesian optimization algorithm for accurate 3D reconstruction in the presence of noise. Numerical tests on noisy agricultural environments showcase the merits of the proposed approach for 3D reconstruction with even a small number of available cameras.
OCAug 13, 2023
Conic Descent Redux for Memory-Efficient OptimizationBingcong Li, Georgios B. Giannakis
Conic programming has well-documented merits in a gamut of signal processing and machine learning tasks. This contribution revisits a recently developed first-order conic descent (CD) solver, and advances it in three aspects: intuition, theory, and algorithmic implementation. It is found that CD can afford an intuitive geometric derivation that originates from the dual problem. This opens the door to novel algorithmic designs, with a momentum variant of CD, momentum conic descent (MOCO) exemplified. Diving deeper into the dual behavior CD and MOCO reveals: i) an analytically justified stopping criterion; and, ii) the potential to design preconditioners to speed up dual convergence. Lastly, to scale semidefinite programming (SDP) especially for low-rank solutions, a memory efficient MOCO variant is developed and numerically validated.
LGFeb 9
ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth ScalingYilang Zhang, Bingcong Li, Niao He et al.
Scaling network depth has been a central driver behind the success of modern foundation models, yet recent investigations suggest that deep layers are often underutilized. This paper revisits the default mechanism for deepening neural networks, namely residual connections, from an optimization perspective. Rigorous analysis proves that the layout of residual connections can fundamentally shape convergence behavior, and even induces an exponential gap in convergence rates. Prompted by this insight, we introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data. ANCRe adaptively reassigns residual connections with negligible computational and memory overhead ($<1\%$), while enabling more effective utilization of network depth. Extensive numerical tests across pre-training of large language models, diffusion models, and deep ResNets demonstrate consistently accelerated convergence, boosted performance, and enhanced depth efficiency over conventional residual connections.
19.5LGApr 14
Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient EstimationYilang Zhang, Abraham Jaeger Mountain, Bingcong Li et al.
Meta-learning offers a principled framework leveraging \emph{task-invariant} priors from related tasks, with which \emph{task-specific} models can be fine-tuned on downstream tasks, even with limited data records. Gradient-based meta-learning (GBML) relies on gradient descent (GD) to adapt the prior to a new task. Albeit effective, these methods incur high computational overhead that scales linearly with the number of GD steps. To enhance efficiency and scalability, existing methods approximate the gradient of prior parameters (meta-gradient) via truncated backpropagation, yet suffer large approximation errors. Targeting accurate approximation, this work puts forth binomial GBML (BinomGBML), which relies on a truncated binomial expansion for meta-gradient estimation. This novel expansion endows more information in the meta-gradient estimation via efficient parallel computation. As a running paradigm applied to model-agnostic meta-learning (MAML), the resultant BinomMAML provably enjoys error bounds that not only improve upon existing approaches, but also decay super-exponentially under mild conditions. Numerical tests corroborate the theoretical analysis and showcase boosted performance with slightly increased computational overhead.
70.3LGApr 23
Low-Rank Adaptation Redux for Large ModelsBingcong Li, Yilang Zhang, Georgios B. Giannakis
Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of billion-parameter networks with minimal computational and memory overhead. Despite its empirical success and rapid proliferation of variants, it remains elusive which architectural choices, optimization techniques, and deployment constraints should guide practical method selection. This overview revisits LoRA through the lens of signal processing (SP), bridging modern adapter designs with classical low-rank modeling tools and inverse problems, as well as highlighting how SP principles can inform principled advances of fine-tuning approaches. Rather than providing a comprehensive enumeration and empirical comparisons of LoRA variants, emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness. These advances are categorized into three complementary axes: architectural design, efficient optimization, and pertinent applications. The first axis builds on singular value decomposition (SVD)-based factorization, rank-augmentation constructions, and cross-layer tensorization, while the second axis deals with initialization, alternating solvers, gauge-invariant optimization, and parameterization-aware methods. Beyond fine-tuning, emerging applications of LoRA are accounted across the entire lifecycle of large models, ranging from pre- and post-training to serving/deployment. Finally, open research directions are outlined at the confluence of SP and deep learning to catalyze a bidirectional frontier: classical SP tools provide a principled vocabulary for designing principled PEFT methods, while the unique challenges facing modern deep learning, especially the overwhelming scale and prohibitive overhead, also offer new research lines benefiting the SP community in return.
LGMay 24, 2025
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large ModelsYilang Zhang, Bingcong Li, Georgios B. Giannakis
Low-Rank Adaptation (LoRA) lowers the computational and memory overhead of fine-tuning large models by updating a low-dimensional subspace of the pre-trained weight matrix. Albeit efficient, LoRA exhibits suboptimal convergence and noticeable performance degradation, due to inconsistent and imbalanced weight updates induced by its nonunique low-rank factorizations. To overcome these limitations, this article identifies the optimal low-rank factorization per step that minimizes an upper bound on the loss. The resultant refactored low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates, thus speeding up stable convergence. Extensive experiments evaluate RefLoRA on natural language understanding, and commonsense reasoning tasks with popular large language models including DeBERTaV3, LLaMA-7B, LLaMA2-7B and LLaMA3-8B. The numerical tests corroborate that RefLoRA converges faster, outperforms various benchmarks, and enjoys negligible computational overhead compared to state-of-the-art LoRA variants.
LGSep 2, 2025
VASSO: Variance Suppression for Sharpness-Aware MinimizationBingcong Li, Yilang Zhang, Georgios B. Giannakis
Sharpness-aware minimization (SAM) has well-documented merits in enhancing generalization of deep neural network models. Accounting for sharpness in the loss function geometry, where neighborhoods of `flat minima' heighten generalization ability, SAM seeks `flat valleys' by minimizing the maximum loss provoked by an adversarial perturbation within the neighborhood. Although critical to account for sharpness of the loss function, in practice SAM suffers from `over-friendly adversaries,' which can curtail the outmost level of generalization. To avoid such `friendliness,' the present contribution fosters stabilization of adversaries through variance suppression (VASSO). VASSO offers a general approach to provably stabilize adversaries. In particular, when integrating VASSO with SAM, improved generalizability is numerically validated on extensive vision and language tasks. Once applied on top of a computationally efficient SAM variant, VASSO offers a desirable generalization-computation tradeoff.
LGDec 20, 2023
Meta-Learning with Versatile Loss Geometries for Fast Adaptation Using Mirror DescentYilang Zhang, Bingcong Li, Georgios B. Giannakis
Utilizing task-invariant prior knowledge extracted from related tasks, meta-learning is a principled framework that empowers learning a new task especially when data records are limited. A fundamental challenge in meta-learning is how to quickly "adapt" the extracted prior in order to train a task-specific model within a few optimization steps. Existing approaches deal with this challenge using a preconditioner that enhances convergence of the per-task training process. Though effective in representing locally a quadratic training loss, these simple linear preconditioners can hardly capture complex loss geometries. The present contribution addresses this limitation by learning a nonlinear mirror map, which induces a versatile distance metric to enable capturing and optimizing a wide range of loss geometries, hence facilitating the per-task training. Numerical tests on few-shot learning datasets demonstrate the superior expressiveness and convergence of the advocated approach.
LGOct 27, 2025
ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-TuningYilang Zhang, Xiaodong Yang, Yiwei Cai et al.
As large language models (LLMs) continue to scale in size, the computational overhead has become a major bottleneck for task-specific fine-tuning. While low-rank adaptation (LoRA) effectively curtails this cost by confining the weight updates to a low-dimensional subspace, such a restriction can hinder effectiveness and slow convergence. This contribution deals with these limitations by accumulating progressively a high-rank weight update from consecutive low-rank increments. Specifically, the per update optimal low-rank matrix is identified to minimize the loss function and closely approximate full fine-tuning. To endow efficient and seamless optimization without restarting, this optimal choice is formed by appropriately scaling the columns of the original low-rank matrix. Rigorous performance guarantees reveal that the optimal scaling can be found analytically. Extensive numerical tests with popular LLMs scaling up to 12 billion parameters demonstrate a consistent performance gain and fast convergence relative to state-of-the-art LoRA variants on diverse tasks including natural language understanding, commonsense reasoning, and mathematical problem solving.
LGOct 7, 2025
Conformalized Gaussian processes for online uncertainty quantification over graphsJinwen Xu, Qin Lu, Georgios B. Giannakis
Uncertainty quantification (UQ) over graphs arises in a number of safety-critical applications in network science. The Gaussian process (GP), as a classical Bayesian framework for UQ, has been developed to handle graph-structured data by devising topology-aware kernel functions. However, such GP-based approaches are limited not only by the prohibitive computational complexity, but also the strict modeling assumptions that might yield poor coverage, especially with labels arriving on the fly. To effect scalability, we devise a novel graph-aware parametric GP model by leveraging the random feature (RF)-based kernel approximation, which is amenable to efficient recursive Bayesian model updates. To further allow for adaptivity, an ensemble of graph-aware RF-based scalable GPs have been leveraged, with per-GP weight adapted to data arriving incrementally. To ensure valid coverage with robustness to model mis-specification, we wed the GP-based set predictors with the online conformal prediction framework, which post-processes the prediction sets using adaptive thresholds. Experimental results the proposed method yields improved coverage and efficient prediction sets over existing baselines by adaptively ensembling the GP models and setting the key threshold parameters in CP.
ROSep 28, 2025
BOSfM: A View Planning Framework for Optimal 3D Reconstruction of Agricultural ScenesAthanasios Bacharis, Konstantinos D. Polyzos, Georgios B. Giannakis et al.
Active vision (AV) has been in the spotlight of robotics research due to its emergence in numerous applications including agricultural tasks such as precision crop monitoring and autonomous harvesting to list a few. A major AV problem that gained popularity is the 3D reconstruction of targeted environments using 2D images from diverse viewpoints. While collecting and processing a large number of arbitrarily captured 2D images can be arduous in many practical scenarios, a more efficient solution involves optimizing the placement of available cameras in 3D space to capture fewer, yet more informative, images that provide sufficient visual information for effective reconstruction of the environment of interest. This process termed as view planning (VP), can be markedly challenged (i) by noise emerging in the location of the cameras and/or in the extracted images, and (ii) by the need to generalize well in other unknown similar agricultural environments without need for re-optimizing or re-training. To cope with these challenges, the present work presents a novel VP framework that considers a reconstruction quality-based optimization formulation that relies on the notion of `structure-from-motion' to reconstruct the 3D structure of the sought environment from the selected 2D images. With no analytic expression of the optimization function and with costly function evaluations, a Bayesian optimization approach is proposed to efficiently carry out the VP process using only a few function evaluations, while accounting for different noise cases. Numerical tests on both simulated and real agricultural settings signify the benefits of the advocated VP approach in efficiently estimating the optimal camera placement to accurately reconstruct 3D environments of interest, and generalize well on similar unknown environments.
SPSep 10, 2025
Deploying AI for Signal Processing education: Selected challenges and intriguing opportunitiesJarvis Haupt, Qin Lu, Yanning Shen et al.
Powerful artificial intelligence (AI) tools that have emerged in recent years -- including large language models, automated coding assistants, and advanced image and speech generation technologies -- are the result of monumental human achievements. These breakthroughs reflect mastery across multiple technical disciplines and the resolution of significant technological challenges. However, some of the most profound challenges may still lie ahead. These challenges are not purely technical but pertain to the fair and responsible use of AI in ways that genuinely improve the global human condition. This article explores one promising application aligned with that vision: the use of AI tools to facilitate and enhance education, with a specific focus on signal processing (SP). It presents two interrelated perspectives: identifying and addressing technical limitations, and applying AI tools in practice to improve educational experiences. Primers are provided on several core technical issues that arise when using AI in educational settings, including how to ensure fairness and inclusivity, handle hallucinated outputs, and achieve efficient use of resources. These and other considerations -- such as transparency, explainability, and trustworthiness -- are illustrated through the development of an immersive, structured, and reliable "smart textbook." The article serves as a resource for researchers and educators seeking to advance AI's role in engineering education.
LGSep 2, 2025
Learnable Loss Geometries with Mirror Descent for Scalable and Convergent Meta-LearningYilang Zhang, Bingcong Li, Georgios B. Giannakis
Utilizing task-invariant knowledge acquired from related tasks as prior information, meta-learning offers a principled approach to learning a new task with limited data records. Sample-efficient adaptation of this prior information is a major challenge facing meta-learning, and plays an important role because it facilitates training the sought task-specific model with just a few optimization steps. Past works deal with this challenge through preconditioning that speeds up convergence of the per-task training. Though effective in representing locally quadratic loss curvatures, simple linear preconditioning can be hardly potent with complex loss geometries. Instead of relying on a quadratic distance metric, the present contribution copes with complex loss metrics by learning a versatile distance-generating function, which induces a nonlinear mirror map to effectively capture and optimize a wide range of loss geometries. With suitable parameterization, this generating function is effected by an expressive neural network that is provably a valid distance. Analytical results establish convergence of not only the proposed method, but also all meta-learning approaches based on preconditioning. To attain gradient norm less than $ε$, the convergence rate of $\mathcal{O}(ε^{-2})$ is on par with standard gradient-based meta-learning methods. Numerical tests on few-shot learning datasets demonstrate the superior empirical performance of the novel algorithm, as well as its rapid per-task convergence, which markedly reduces the number of adaptation steps, hence also accommodating large-scale meta-learning models.
LGJan 11, 2025
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning AlgorithmYilang Zhang, Bingcong Li, Georgios B. Giannakis
Targeting solutions over `flat' regions of the loss landscape, sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning. While several SAM variants have been developed to this end, a unifying approach that also guides principled algorithm design has been elusive. This contribution leverages preconditioning (pre) to unify SAM variants and provide not only unifying convergence analysis, but also valuable insights. Building upon preSAM, a novel algorithm termed infoSAM is introduced to address the so-called adversarial model degradation issue in SAM by adjusting gradients depending on noise estimates. Extensive numerical tests demonstrate the superiority of infoSAM across various benchmarks.
LGDec 13, 2023
Contractive error feedback for gradient compressionBingcong Li, Shuai Zheng, Parameswaran Raman et al.
On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3.
MLDec 1, 2021
Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian ProcessesQin Lu, Georgios B. Giannakis
Value function approximation is a crucial module for policy evaluation in reinforcement learning when the state space is large or continuous. The present paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning, where a Gaussian process (GP) prior is presumed on the sought value function, and instantaneous rewards are probabilistically generated based on value function evaluations at two consecutive states. Capitalizing on a random feature-based approximant of the GP prior, an online scalable (OS) approach, termed {OS-GPTD}, is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs. To benchmark the performance of OS-GPTD even in an adversarial setting, where the modeling assumptions are violated, complementary worst-case analyses are performed by upper-bounding the cumulative Bellman error as well as the long-term reward prediction error, relative to their counterparts from a fixed value function estimator with the entire state-reward trajectory in hindsight. Moreover, to alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme, termed OS-EGPTD, that can jointly infer the value function, and select interactively the EGP kernel on-the-fly. Finally, performances of the novel OS-(E)GPTD schemes are evaluated on two benchmark problems.
LGOct 20, 2021
Distributionally Robust Semi-Supervised Learning Over GraphsAlireza Sadeghi, Meng Ma, Bingcong Li et al.
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications. To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently. By succinctly encoding local graph structures and features of nodes, state-of-the-art GNNs can scale linearly with the size of graph. Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes. Specifically whenever mismatches between training and testing data distribution exists, these models fail in practice. Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements. In this context, a distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations. The data distribution is considered unknown, but lies within a Wasserstein ball centered around empirical data distribution. A robust model is obtained by minimizing the worst expected loss over this ball. However, solving the emerging functional optimization problem is challenging, if not impossible. Advocating a strong duality condition, we develop a principled method that renders the problem tractable and efficiently solvable. Experiments assess the performance of the proposed method.
MLOct 13, 2021
Incremental Ensemble Gaussian ProcessesQin Lu, Georgios V. Karanikolas, Georgios B. Giannakis
Belonging to the family of Bayesian nonparametrics, Gaussian process (GP) based approaches have well-documented merits not only in learning over a rich class of nonlinear functions, but also in quantifying the associated uncertainty. However, most GP methods rely on a single preselected kernel function, which may fall short in characterizing data samples that arrive sequentially in time-critical applications. To enable {\it online} kernel adaptation, the present work advocates an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an {\it ensemble} of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary. With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with {\it scalability}, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions. Further, the novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner. To benchmark the performance of IE-GP and its dynamic variant in the adversarial setting where the modeling assumptions are violated, rigorous performance analysis has been conducted via the notion of regret, as the norm in online convex optimization. Last but not the least, online unsupervised learning for dimensionality reduction is explored under the novel IE-GP framework. Synthetic and real data tests demonstrate the effectiveness of the proposed schemes.
OCOct 8, 2021
Heavy Ball Momentum for Conditional GradientBingcong Li, Alireza Sadeghi, Georgios B. Giannakis
Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented merits in machine learning and signal processing applications. Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW. Specifically, it is established that heavy ball offers a unifying perspective on the primal-dual (PD) convergence, and enjoys a tighter per iteration PD error rate, for multiple choices of step sizes, where PD error can serve as the stopping criterion in practice. In addition, it is asserted that restart, a scheme typically employed jointly with Nesterov's momentum, can further tighten this PD error bound. Numerical results demonstrate the usefulness of heavy ball momentum in FW iterations.
LGOct 7, 2021
Detecting adversaries in CrowdsourcingPanagiotis A. Traganitis, Georgios B. Giannakis
Despite its successes in various machine learning and data science tasks, crowdsourcing can be susceptible to attacks from dedicated adversaries. This work investigates the effects of adversaries on crowdsourced classification, under the popular Dawid and Skene model. The adversaries are allowed to deviate arbitrarily from the considered crowdsourcing model, and may potentially cooperate. To address this scenario, we develop an approach that leverages the structure of second-order moments of annotator responses, to identify large numbers of adversaries, and mitigate their impact on the crowdsourcing task. The potential of the proposed approach is empirically demonstrated on synthetic and real crowdsourcing datasets.
SIApr 17, 2021
Unveiling Anomalous Edges and Nominal Connectivity of Attributed NetworksKonstantinos D. Polyzos, Costas Mavromatis, Vassilis N. Ioannidis et al.
Uncovering anomalies in attributed networks has recently gained popularity due to its importance in unveiling outliers and flagging adversarial behavior in a gamut of data and network science applications including {the Internet of Things (IoT)}, finance, security, to list a few. The present work deals with uncovering anomalous edges in attributed graphs using two distinct formulations with complementary strengths, which can be easily distributed, and hence efficient. The first relies on decomposing the graph data matrix into low rank plus sparse components to markedly improve performance. The second broadens the scope of the first by performing robust recovery of the unperturbed graph, which enhances the anomaly identification performance. The novel methods not only capture anomalous edges linking nodes of different communities, but also spurious connections between any two nodes with different features. Experiments conducted on real and synthetic data corroborate the effectiveness of both methods in the anomaly identification task.
OCMar 27, 2021
Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural NetworksManish K. Singh, Vassilis Kekatos, Georgios B. Giannakis
To shift the computational burden from real-time to offline in delay-critical power systems applications, recent works entertain the idea of using a deep neural network (DNN) to predict the solutions of the AC optimal power flow (AC-OPF) once presented load demands. As network topologies may change, training this DNN in a sample-efficient manner becomes a necessity. To improve data efficiency, this work utilizes the fact OPF data are not simple training labels, but constitute the solutions of a parametric optimization problem. We thus advocate training a sensitivity-informed DNN (SI-DNN) to match not only the OPF optimizers, but also their partial derivatives with respect to the OPF parameters (loads). It is shown that the required Jacobian matrices do exist under mild conditions, and can be readily computed from the related primal/dual solutions. The proposed SI-DNN is compatible with a broad range of OPF solvers, including a non-convex quadratically constrained quadratic program (QCQP), its semidefinite program (SDP) relaxation, and MATPOWER; while SI-DNN can be seamlessly integrated in other learning-to-OPF schemes. Numerical tests on three benchmark power systems corroborate the advanced generalization and constraint satisfaction capabilities for the OPF solutions predicted by an SI-DNN over a conventionally trained DNN, especially in low-data setups.
LGDec 20, 2020
Bayesian Crowdsourcing with ConstraintsPanagiotis A. Traganitis, Georgios B. Giannakis
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks, by leveraging crowds of human annotators. When additional information is available about the data, semi-supervised crowdsourcing approaches that enhance the aggregation of labels from human annotators are well motivated. This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision: a) label constraints, that provide ground-truth labels for a subset of data; and b) potentially easier to obtain instance-level constraints, that indicate relationships between pairs of data. Bayesian algorithms based on variational inference are developed for each regime, and their quantifiably improved performance, compared to unsupervised crowdsourcing, is analytically and empirically validated on several crowdsourcing datasets.
LGDec 10, 2020
Adversarial Linear Contextual Bandits with Graph-Structured Side ObservationsLingda Wang, Bingcong Li, Huozhi Zhou et al.
This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: \emph{contexts} and \emph{side observations}. In this setting, a learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on \texttt{EXP3}. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, \texttt{EXP3-LGC-U}, achieves the regret of order $\mathcal{O}(\sqrt{(K+α(G)d)T\log{K}})$ over the time horizon $T$, where $α(G)$ is the average \emph{independence number} of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, \texttt{EXP3-LGC-IX}, is developed for a special class of problems, for which the regret is reduced to $\mathcal{O}(\sqrt{α(G)dT\log{K}\log(KT)})$ for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.
OCDec 9, 2020
Enhancing Parameter-Free Frank Wolfe with an Extra SubproblemBingcong Li, Lingda Wang, Georgios B. Giannakis et al.
Aiming at convex optimization under structural constraints, this work introduces and analyzes a variant of the Frank Wolfe (FW) algorithm termed ExtraFW. The distinct feature of ExtraFW is the pair of gradients leveraged per iteration, thanks to which the decision variable is updated in a prediction-correction (PC) format. Relying on no problem dependent parameters in the step sizes, the convergence rate of ExtraFW for general convex problems is shown to be ${\cal O}(\frac{1}{k})$, which is optimal in the sense of matching the lower bound on the number of solved FW subproblems. However, the merit of ExtraFW is its faster rate ${\cal O}\big(\frac{1}{k^2} \big)$ on a class of machine learning problems. Compared with other parameter-free FW variants that have faster rates on the same problems, ExtraFW has improved rates and fine-grained analysis thanks to its PC update. Numerical tests on binary classification with different sparsity-promoting constraints demonstrate that the empirical performance of ExtraFW is significantly better than FW, and even faster than Nesterov's accelerated gradient on certain datasets. For matrix completion, ExtraFW enjoys smaller optimality gap, and lower rank than FW.
LGJul 7, 2020
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial DataAlireza Sadeghi, Gang Wang, Meng Ma et al.
Data used to train machine learning models can be adversarial--maliciously constructed by adversaries to fool the model. Challenge also arises by privacy, confidentiality, or due to legal constraints when data are geographically gathered and stored across multiple learners, some of which may hold even an "anonymized" or unreliable dataset. In this context, the distributionally robust optimization framework is considered for training a parametric model, both in centralized and federated learning settings. The objective is to endow the trained model with robustness against adversarially manipulated input data, or, distributional uncertainties, such as mismatches between training and testing data distributions, or among datasets stored at different workers. To this aim, the data distribution is assumed unknown, and lies within a Wasserstein ball centered around the empirical data distribution. This robust learning task entails an infinite-dimensional optimization problem, which is challenging. Leveraging a strong duality result, a surrogate is obtained, for which three stochastic primal-dual algorithms are developed: i) stochastic proximal gradient descent with an $ε$-accurate oracle, which invokes an oracle to solve the convex sub-problems; ii) stochastic proximal gradient descent-ascent, which approximates the solution of the convex sub-problems via a single gradient ascent step; and, iii) a distributionally robust federated learning algorithm, which solves the sub-problems locally at different workers where data are stored. Compared to the empirical risk minimization and federated learning methods, the proposed algorithms offer robustness with little computation overhead. Numerical tests using image datasets showcase the merits of the proposed algorithms under several existing adversarial attacks and distributional uncertainties.
OCJun 19, 2020
How Does Momentum Help Frank Wolfe?Bingcong Li, Mario Coutino, Georgios B. Giannakis et al.
We unveil the connections between Frank Wolfe (FW) type algorithms and the momentum in Accelerated Gradient Methods (AGM). On the negative side, these connections illustrate why momentum is unlikely to be effective for FW type algorithms. The encouraging message behind this link, on the other hand, is that momentum is useful for FW on a class of problems. In particular, we prove that a momentum variant of FW, that we term accelerated Frank Wolfe (AFW), converges with a faster rate $\tilde{\cal O}(\frac{1}{k^2})$ on certain constraint sets despite the same ${\cal O}(\frac{1}{k})$ rate as FW on general cases. Given the possible acceleration of AFW at almost no extra cost, it is thus a competitive alternative to FW. Numerical experiments on benchmarked machine learning tasks further validate our theoretical findings.