15.8CRMay 30
Cyber Security of Sensor Systems for State Sequence Estimation: A Machine Learning ApproachXubin Fang, Rick S. Blum, Ramesh Bharadwaj et al.
Due to possible devastating consequences, counteracting sensor data attacks is an extremely impor- tant topic, which has not seen sufficient study. To the best of our knowledge, this paper develops the first meth- ods that accurately identify/eliminate only the problem- atic attacked sensor data presented to a sequence es- timation/regression algorithm under any attack from our attack model. The approach does not assume a known form for the statistical model of the sensor data, allow- ing data-driven and machine learning sequence estima- tion/regression algorithms to be protected. A simple pro- tection approach for attackers not endowed with knowledge of the details of our protection approach is first developed, followed by additional processing for attacks based on pro- tection system knowledge. Experimental results show that the simple approach achieves performance indistinguish- able from that for an approach which knows which sensors are attacked. For cases where the attacker has knowledge of the protection approach, experimental results indicate the additional processing can be configured so that the worst-case degradation under the additional processing and a large number of sensors attacked can be made signif- icantly smaller than the worst-case degradation of the sim- ple approach, and close to an approach which knows which sensors are attacked, with just a slight degradation under no attacks. Mathematical descriptions of the worst-case attacks are used to demonstrate the additional processing will provide similar advantages for cases for which we do not have numerical results. All the data-driven/machine learning processing used in our approaches employ only unattacked training data.
SYFeb 17, 2019
Sensor Placement for Outage Identifiability in Power Distribution NetworksAnanth Narayan Samudrala, M. Hadi Amini, Soummya Kar et al.
Accurate topology information is critical for effective operation of power distribution networks. Line outages change the operational topology of a distribution network. Hence, outage detection is an important task. Power distribution networks are operated as radial trees and are recently adopting the integration of advanced sensors to monitor the network in real time. In this paper, a dynamic-programming-based minimum cost sensor placement solution is proposed for outage identifiability. We propose a novel formulation of the sensor placement as a cost optimization problem involving binary placement decisions, and then provide an algorithm based on dynamic programming to solve it in polynomial time. The advantage of the proposed placement strategy is that it incorporates various types of sensors, is independent of time varying load statistics, has a polynomial execution time and is cost effective. Numerical results illustrating the proposed sensor placement solution are presented for multiple feeder models including standard IEEE test feeders.
21.6MLMay 30
Statistical Analysis of using the Shapley Value for Sensor Anomaly Localization with Accurate ClassifiersXubin Fang, Rick S. Blum
Recent publications have suggested using the Shap- ley value for sensor anomaly/attack localization. We study the performance of such an approach by using mathematically de- fined optimum binary classifiers in the Shapley value calculation. To judge localization performance, we study the ability of the Shapley value of a given sensor observation to determine if that observation is anomalous. First, we prove that for cases with independent sensor observations, an optimized anomaly test using the Shapley value is equivalent to an optimized lower-complexity anomaly test using a single term in the Shapley value calculation, yielding the exact same probability of error. For some popular dependent observation cases involving two sensors, including correlated bivariate Gaussian/Laplacian probability density functions and constant/Gaussian at- tacks/anomalies, we prove that these two tests are fundamentally different, yielding different decision regions and error probabil- ities. Further, we prove that the Shapley value test is sometimes strictly inferior to the other (single term in Shapley calculation) test in certain statistically dependent bivariate Gaussian scenarios with large correlation magnitude and additive attacks/anomalies, while it is strictly superior in others, depending on the sign of the correlation. One can combine these two approaches to obtain a strictly better approach in these cases. These results, which provide the first theoretical statistical analysis of Shapley-based localization, seem very interesting based on the wide acceptance of the Shapley value by many researchers and should encourage further research on this topic. Numerical results are provided which illustrate our findings.
LGSep 24, 2022
Communication-Efficient {Federated} Learning Using Censored Heavy Ball DescentYicheng Chen, Rick S. Blum, Brian M. Sadler
Distributed machine learning enables scalability and computational offloading, but requires significant levels of communication. Consequently, communication efficiency in distributed learning settings is an important consideration, especially when the communications are wireless and battery-driven devices are employed. In this paper we develop a censoring-based heavy ball (CHB) method for distributed learning in a server-worker architecture. Each worker self-censors unless its local gradient is sufficiently different from the previously transmitted one. The significant practical advantages of the HB method for learning problems are well known, but the question of reducing communications has not been addressed. CHB takes advantage of the HB smoothing to eliminate reporting small changes, and provably achieves a linear convergence rate equivalent to that of the classical HB method for smooth and strongly convex objective functions. The convergence guarantee of CHB is theoretically justified for both convex and nonconvex cases. In addition we prove that, under some conditions, at least half of all communications can be eliminated without any impact on convergence rate. Extensive numerical results validate the communication efficiency of CHB on both synthetic and real datasets, for convex, nonconvex, and nondifferentiable cases. Given a target accuracy, CHB can significantly reduce the number of communications compared to existing algorithms, achieving the same accuracy without slowing down the optimization process.
46.9SYMay 13
Receding Horizon Multi-Agent Deceptive Path PlannerXubin Fang, Brian M. Sadler, Rick S. Blum
Deceptive path planning enables autonomous agents to obscure their true goals from observers by deviating from an expected optimal path. Prior work largely solves full-horizon, end-to-end optimization for single agents, which is expensive to recompute online and difficult to scale or adapt en route. We propose a unified framework for deceptive path planning using a Boltzmann distribution, computing over short-horizon candidate trajectories within a receding-horizon loop. By param- By iterating a user-defined cost that captures deception, resources, and smoothness, and optionally includes coupling terms between agents, the framework yields stochastic policies that balance the tradeoff between optimal paths and deceptive deviation. Policies are updated locally and do not require training. The level of deception and adherence to constraints can be dynamically tuned, enabling online adaptation to changes in goals and constraints such as obstacles. This step-by-step tuning opens the door to new forms of dynamic deception. Simulation studies demonstrate the flexibility of our approach, maintaining deception while adapting to environmental and constraint updates, avoiding the recomputation required by full-horizon methods, and supporting intuitive tuning via a small set of parameters
LGFeb 13
Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power SystemsYue Sun, Likai Wang, Rick S. Blum et al.
Anomaly detection and root cause analysis (RCA) are critical for ensuring the safety and resilience of cyber-physical systems such as power grids. However, existing machine learning models for time series anomaly detection often operate as black boxes, offering only binary outputs without any explanation, such as identifying anomaly type and origin. To address this challenge, we propose Power Interpretable Causality Ordinary Differential Equation (PICODE) Networks, a unified, causality-informed architecture that jointly performs anomaly detection along with the explanation why it is detected as an anomaly, including root cause localization, anomaly type classification, and anomaly shape characterization. Experimental results in power systems demonstrate that PICODE achieves competitive detection performance while offering improved interpretability and reduced reliance on labeled data or external causal graphs. We provide theoretical results demonstrating the alignment between the shape of anomaly functions and the changes in the weights of the extracted causal graphs.
LGApr 1, 2024
Incorporating Domain Differential Equations into Graph Convolutional Networks to Lower Generalization DiscrepancyYue Sun, Chao Chen, Yuesheng Xu et al.
Ensuring both accuracy and robustness in time series prediction is critical to many applications, ranging from urban planning to pandemic management. With sufficient training data where all spatiotemporal patterns are well-represented, existing deep-learning models can make reasonably accurate predictions. However, existing methods fail when the training data are drawn from different circumstances (e.g., traffic patterns on regular days) compared to test data (e.g., traffic patterns after a natural disaster). Such challenges are usually classified under domain generalization. In this work, we show that one way to address this challenge in the context of spatiotemporal prediction is by incorporating domain differential equations into Graph Convolutional Networks (GCNs). We theoretically derive conditions where GCNs incorporating such domain differential equations are robust to mismatched training and testing data compared to baseline domain agnostic models. To support our theory, we propose two domain-differential-equation-informed networks called Reaction-Diffusion Graph Convolutional Network (RDGCN), which incorporates differential equations for traffic speed evolution, and Susceptible-Infectious-Recovered Graph Convolutional Network (SIRGCN), which incorporates a disease propagation model. Both RDGCN and SIRGCN are based on reliable and interpretable domain differential equations that allow the models to generalize to unseen patterns. We experimentally show that RDGCN and SIRGCN are more robust with mismatched testing data than the state-of-the-art deep learning methods.
LGFeb 17, 2025
Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical SystemsYue Sun, Rick S. Blum, Parv Venkitasubramaniam
Dynamical systems, prevalent in various scientific and engineering domains, are susceptible to anomalies that can significantly impact their performance and reliability. This paper addresses the critical challenges of anomaly detection, root cause localization, and anomaly type classification in dynamical systems governed by ordinary differential equations (ODEs). We define two categories of anomalies: cyber anomalies, which propagate through interconnected variables, and measurement anomalies, which remain localized to individual variables. To address these challenges, we propose the Interpretable Causality Ordinary Differential Equation (ICODE) Networks, a model-intrinsic explainable learning framework. ICODE leverages Neural ODEs for anomaly detection while employing causality inference through an explanation channel to perform root cause analysis (RCA), elucidating why specific time periods are flagged as anomalous. ICODE is designed to simultaneously perform anomaly detection, RCA, and anomaly type classification within a single, interpretable framework. Our approach is grounded in the hypothesis that anomalies alter the underlying ODEs of the system, manifesting as changes in causal relationships between variables. We provide a theoretical analysis of how perturbations in learned model parameters can be utilized to identify anomalies and their root causes in time series data. Comprehensive experimental evaluations demonstrate the efficacy of ICODE across various dynamical systems, showcasing its ability to accurately detect anomalies, classify their types, and pinpoint their origins.
LGJul 28, 2025
On Using the Shapley Value for Anomaly Localization: A Statistical InvestigationRick S. Blum, Franziska Freytag
Recent publications have suggested using the Shapley value for anomaly localization for sensor data systems. Using a reasonable mathematical anomaly model for full control, experiments indicate that using a single fixed term in the Shapley value calculation achieves a lower complexity anomaly localization test, with the same probability of error, as a test using the Shapley value for all cases tested. A proof demonstrates these conclusions must be true for all independent observation cases. For dependent observation cases, no proof is available.
LGFeb 5, 2022
Communication Efficient Federated Learning via Ordered ADMM in a Fully Decentralized SettingYicheng Chen, Rick S. Blum, Brian M. Sadler
The challenge of communication-efficient distributed optimization has attracted attention in recent years. In this paper, a communication efficient algorithm, called ordering-based alternating direction method of multipliers (OADMM) is devised in a general fully decentralized network setting where a worker can only exchange messages with neighbors. Compared to the classical ADMM, a key feature of OADMM is that transmissions are ordered among workers at each iteration such that a worker with the most informative data broadcasts its local variable to neighbors first, and neighbors who have not transmitted yet can update their local variables based on that received transmission. In OADMM, we prohibit workers from transmitting if their current local variables are not sufficiently different from their previously transmitted value. A variant of OADMM, called SOADMM, is proposed where transmissions are ordered but transmissions are never stopped for each node at each iteration. Numerical results demonstrate that given a targeted accuracy, OADMM can significantly reduce the number of communications compared to existing algorithms including ADMM. We also show numerically that SOADMM can accelerate convergence, resulting in communication savings compared to the classical ADMM.
LGFeb 5, 2022
Distributed Learning With Sparsified Gradient DifferencesYicheng Chen, Rick S. Blum, Martin Takac et al.
A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger-scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process.
LGJun 5, 2021
Training Robust Graph Neural Networks with Topology Adaptive Edge DroppingZhan Gao, Subhrajit Bhattacharya, Leiming Zhang et al.
Graph neural networks (GNNs) are processing architectures that exploit graph structural information to model representations from network data. Despite their success, GNNs suffer from sub-optimal generalization performance given limited training data, referred to as over-fitting. This paper proposes Topology Adaptive Edge Dropping (TADropEdge) method as an adaptive data augmentation technique to improve generalization performance and learn robust GNN models. We start by explicitly analyzing how random edge dropping increases the data diversity during training, while indicating i.i.d. edge dropping does not account for graph structural information and could result in noisy augmented data degrading performance. To overcome this issue, we consider graph connectivity as the key property that captures graph topology. TADropEdge incorporates this factor into random edge dropping such that the edge-dropped subgraphs maintain similar topology as the underlying graph, yielding more satisfactory data augmentation. In particular, TADropEdge first leverages the graph spectrum to assign proper weights to graph edges, which represent their criticality for establishing the graph connectivity. It then normalizes the edge weights and drops graph edges adaptively based on their normalized weights. Besides improving generalization performance, TADropEdge reduces variance for efficient training and can be applied as a generic method modular to different GNN models. Intensive experiments on real-life and synthetic datasets corroborate theory and verify the effectiveness of the proposed method.
CVFeb 12, 2017
Sparse Representation based Multi-sensor Image Fusion: A ReviewQiang Zhang, Yi Liu, Rick S. Blum et al.
As a result of several successful applications in computer vision and image processing, sparse representation (SR) has attracted significant attention in multi-sensor image fusion. Unlike the traditional multiscale transforms (MSTs) that presume the basis functions, SR learns an over-complete dictionary from a set of training images for image fusion, and it achieves more stable and meaningful representations of the source images. By doing so, the SR-based fusion methods generally outperform the traditional MST-based image fusion methods in both subjective and objective tests. In addition, they are less susceptible to mis-registration among the source images, thus facilitating the practical applications. This survey paper proposes a systematic review of the SR-based multi-sensor image fusion literature, highlighting the pros and cons of each category of approaches. Specifically, we start by performing a theoretical investigation of the entire system from three key algorithmic aspects, (1) sparse representation models; (2) dictionary learning methods; and (3) activity levels and fusion rules. Subsequently, we show how the existing works address these scientific problems and design the appropriate fusion rules for each application, such as multi-focus image fusion and multi-modality (e.g., infrared and visible) image fusion. At last, we carry out some experiments to evaluate the impact of these three algorithmic components on the fusion performance when dealing with different applications. This article is expected to serve as a tutorial and source of reference for researchers preparing to enter the field or who desire to employ the sparse representation theory in other fields.
ITMay 24, 2016
Functional Forms of Optimum Spoofing Attacks for Vector Parameter Estimation in Quantized Sensor NetworksJiangfan Zhang, Rick S. Blum, Lance Kaplan et al.
Estimation of an unknown deterministic vector from quantized sensor data is considered in the presence of spoofing attacks which alter the data presented to several sensors. Contrary to previous work, a generalized attack model is employed which manipulates the data using transformations with arbitrary functional forms determined by some attack parameters whose values are unknown to the attacked system. For the first time, necessary and sufficient conditions are provided under which the transformations provide a guaranteed attack performance in terms of Cramer-Rao Bound (CRB) regardless of the processing the estimation system employs, thus defining a highly desirable attack. Interestingly, these conditions imply that, for any such attack when the attacked sensors can be perfectly identified by the estimation system, either the Fisher Information Matrix (FIM) for jointly estimating the desired and attack parameters is singular or that the attacked system is unable to improve the CRB for the desired vector parameter through this joint estimation even though the joint FIM is nonsingular. It is shown that it is always possible to construct such a highly desirable attack by properly employing a sufficiently large dimension attack vector parameter relative to the number of quantization levels employed, which was not observed previously. To illustrate the theory in a concrete way, we also provide some numerical results which corroborate that under the highly desirable attack, attacked data is not useful in reducing the CRB.
SYMay 17, 2015
An Electrical Structure-Based Approach to PMU Placement in the Electric Power GridK. G. Nagananda, Shalinee Kishore, Rick S. Blum
The phasor measurement unit (PMU) placement problem is revisited by taking into account a stronger characterization of the electrical connectedness between various buses in the grid. To facilitate this study, the placement problem is approached from the perspective of the \emph{electrical structure} which, unlike previous work on PMU placement, accounts for the sensitivity between power injections and nodal phase angle differences between various buses in the power network. The problem is formulated as a binary integer program with the objective to minimize the number of PMUs for complete network observability in the absence of zero injection measurements. The implication of the proposed approach on static state estimation and fault detection algorithms incorporating PMU measurements is analyzed. Results show a significant improvement in the performance of estimation and detection schemes by employing the electrical structure-based PMU placement compared to its topological counterpart. In light of recent advances in the electrical structure of the grid, our study provides a more realistic perspective of PMU placement in the electric power grid.