Oliver Kosut

h-index31

20papers

510citations

Novelty47%

AI Score48

Ranked #50,846 of 201,326 authors (top 25%)#210 in SY (top 21%)

20 Papers

LGJun 3

DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum

Naima Tasnim, Lalitha Sankar, Oliver Kosut

Differentially private stochastic gradient descent (DP-SGD) has become the standard framework for privacy-preserving machine learning, yet its reliance on a fixed gradient clipping threshold to limit sensitivity remains a significant practical limitation. Adaptive clipping algorithms such as AdaClip shift and scale the gradient prior to clipping and adding noise so that the clipped gradient yields a more informative descent direction. The shift and scaling parameters are selected adaptively based on the empirical mean and variance. However, in existing adaptive clipping algorithms, these empirical estimates have not been also used for momentum to accelerate training itself. On the other hand, DP-Adam is an algorithm that exploits Adam-like momentum updates based on the gradient mean and variance to accelerate training, but does not exploit these estimates for adaptive clipping. In this work, we propose Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum (DP-MacAdam), a novel algorithm that combines these two approaches so as to use the same mean and variance estimates for both clipping and momentum. We perform an analysis showing that DP-MacAdam estimates the gradient variances in a bias-free manner. In addition, we empirically evaluate the privacy and accuracy of DP-MacAdam, demonstrating that it achieves improved model utility compared to DP-SGD, AdaClip, and DP-Adam baselines, without requiring manual tuning of the clipping threshold.

ITMay 10

Information-Theoretic Privacy with General Distortion Constraints

Kousha Kalantari, Oliver Kosut, Lalitha Sankar

The privacy-utility tradeoff problem is formulated as determining the privacy mechanism (random mapping) that minimizes the mutual information (a metric for privacy leakage) between the private features of the original dataset and a released version. The minimization is studied with two types of constraints on the distortion between the public features and the released version of the dataset: (i) subject to a constraint on the expected value of a cost function $f$ applied to the distortion, and (ii) subject to bounding the complementary CDF of the distortion by a non-increasing function $g$. The first scenario captures various practical cost functions for distorted released data, while the second scenario covers large deviation constraints on utility. The asymptotic optimal leakage is derived in both scenarios. For the distortion cost constraint, it is shown that for convex cost functions there is no asymptotic loss in using stationary memoryless mechanisms. For the complementary CDF bound on distortion, the asymptotic leakage is derived for general mechanisms and shown to be the integral of the single letter leakage function with respect to the Lebesgue -- Stieltjes measure defined based on the refined bound on distortion. However, it is shown that memoryless mechanisms are generally suboptimal in both cases.

SYMay 1, 2018

Can Attackers with Limited Information Exploit Historical Data to Mount Successful False Data Injection Attacks on Power Systems?

Jiazi Zhang, Zhigang Chu, Lalitha Sankar et al.

This paper studies physical consequences of unobservable false data injection (FDI) attacks designed only with information inside a sub-network of the power system. The goal of this attack is to overload a chosen target line without being detected via measurements. To overcome the limited information, a multiple linear regression model is developed to learn the relationship between the external network and the attack sub-network from historical data. The worst possible consequences of such FDI attacks are evaluated by solving a bi-level optimization problem wherein the first level models the limited attack resources, while the second level formulates the system response to such attacks via DC optimal power flow (OPF). The attack model with limited information is reflected in the DC OPF formulation that only takes into account the system information for the attack sub-network. The vulnerability of this attack model is illustrated on the IEEE 24-bus RTS and IEEE 118-bus systems.

SYNov 1, 2020

Vulnerability Assessment of Large-scale Power Systems to False Data Injection Attacks

Zhigang Chu, Jiazi Zhang, Oliver Kosut et al.

This paper studies the vulnerability of large-scale power systems to false data injection (FDI) attacks through their physical consequences. Prior work has shown that an attacker-defender bi-level linear program (ADBLP) can be used to determine the worst-case consequences of FDI attacks aiming to maximize the physical power flow on a target line. This ADBLP can be transformed into a single-level mixed-integer linear program, but it is hard to solve on large power systems due to numerical difficulties. In this paper, four computationally efficient algorithms are presented to solve the attack optimization problem on large power systems. These algorithms are applied on the IEEE 118-bus system and the Polish system with 2383 buses to conduct vulnerability assessments, and they provide feasible attacks that cause line overflows, as well as upper bounds on the maximal power flow resulting from any attack.

SYMay 4, 2017

False Data Injection Attacks on Phasor Measurements That Bypass Low-rank Decomposition

Jiazi Zhang, Zhigang Chu, Lalitha Sankar et al.

This paper studies the vulnerability of phasor measurement units (PMUs) to false data injection (FDI) attacks. Prior work demonstrated that unobservable FDI attacks that can bypass traditional bad data detectors based on measurement residuals can be identified by detector based on low-rank decomposition (LD). In this work, a class of more sophisticated FDI attacks that captures the temporal correlation of PMU data is introduced. Such attacks are designed with a convex optimization problem and can always bypass the LD detector. The vulnerability of this attack model is illustrated on both the IEEE 24-bus RTS and the IEEE 118-bus systems.

SYMay 20, 2016

Evaluating Power System Vulnerability to False Data Injection Attacks via Scalable Optimization

Zhigang Chu, Jiazi Zhang, Oliver Kosut et al.

Physical consequences to power systems of false data injection cyber-attacks are considered. Prior work has shown that the worst-case consequences of such an attack can be determined using a bi-level optimization problem, wherein an attack is chosen to maximize the physical power flow on a target line subsequent to re-dispatch. This problem can be solved as a mixed-integer linear program, but it is difficult to scale to large systems due to numerical challenges. Three new computationally efficient algorithms to solve this problem are presented. These algorithms provide lower and upper bounds on the system vulnerability measured as the maximum power flow subsequent to an attack. Using these techniques, vulnerability assessments are conducted for IEEE 118-bus system and Polish system with 2383 buses.

CRJun 25, 2022

Cactus Mechanisms: Optimal Differential Privacy Mechanisms in the Large-Composition Regime

Wael Alghamdi, Shahab Asoodeh, Flavio P. Calmon et al.

Most differential privacy mechanisms are applied (i.e., composed) numerous times on sensitive data. We study the design of optimal differential privacy mechanisms in the limit of a large number of compositions. As a consequence of the law of large numbers, in this regime the best privacy mechanism is the one that minimizes the Kullback-Leibler divergence between the conditional output distributions of the mechanism given two different inputs. We formulate an optimization problem to minimize this divergence subject to a cost constraint on the noise. We first prove that additive mechanisms are optimal. Since the optimization problem is infinite dimensional, it cannot be solved directly; nevertheless, we quantize the problem to derive near-optimal additive mechanisms that we call "cactus mechanisms" due to their shape. We show that our quantization approach can be arbitrarily close to an optimal mechanism. Surprisingly, for quadratic cost, the Gaussian mechanism is strictly sub-optimal compared to this cactus mechanism. Finally, we provide numerical results which indicate that cactus mechanism outperforms the Gaussian mechanism for a finite number of compositions.

CRAug 20, 2022

The Saddle-Point Accountant for Differential Privacy

Wael Alghamdi, Shahab Asoodeh, Flavio P. Calmon et al.

We introduce a new differential privacy (DP) accountant called the saddle-point accountant (SPA). SPA approximates privacy guarantees for the composition of DP mechanisms in an accurate and fast manner. Our approach is inspired by the saddle-point method -- a ubiquitous numerical technique in statistics. We prove rigorous performance guarantees by deriving upper and lower bounds for the approximation error offered by SPA. The crux of SPA is a combination of large-deviation methods with central limit theorems, which we derive via exponentially tilting the privacy loss random variables corresponding to the DP mechanisms. One key advantage of SPA is that it runs in constant time for the $n$-fold composition of a privacy mechanism. Numerical experiments demonstrate that SPA achieves comparable accuracy to state-of-the-art accounting methods with a faster runtime.

LGSep 18, 2023

A Semi-Supervised Approach for Power System Event Identification

Nima Taghipourbazargani, Lalitha Sankar, Oliver Kosut

Event identification is increasingly recognized as crucial for enhancing the reliability, security, and stability of the electric power system. With the growing deployment of Phasor Measurement Units (PMUs) and advancements in data science, there are promising opportunities to explore data-driven event identification via machine learning classification techniques. However, obtaining accurately-labeled eventful PMU data samples remains challenging due to its labor-intensive nature and uncertainty about the event type (class) in real-time. Thus, it is natural to use semi-supervised learning techniques, which make use of both labeled and unlabeled samples. %We propose a novel semi-supervised framework to assess the effectiveness of incorporating unlabeled eventful samples to enhance existing event identification methodologies. We evaluate three categories of classical semi-supervised approaches: (i) self-training, (ii) transductive support vector machines (TSVM), and (iii) graph-based label spreading (LS) method. Our approach characterizes events using physically interpretable features extracted from modal analysis of synthetic eventful PMU data. In particular, we focus on the identification of four event classes whose identification is crucial for grid operations. We have developed and publicly shared a comprehensive Event Identification package which consists of three aspects: data generation, feature extraction, and event identification with limited labels using semi-supervised methodologies. Using this package, we generate and evaluate eventful PMU data for the South Carolina synthetic network. Our evaluation consistently demonstrates that graph-based LS outperforms the other two semi-supervised methods that we consider, and can noticeably improve event identification performance relative to the setting with only a small number of labeled samples.

LGFeb 6

ArcMark: Multi-bit LLM Watermark via Optimal Transport

Atefeh Gilani, Carol Xuan Long, Sajani Vithana et al.

Watermarking is an important tool for promoting the responsible use of language models (LMs). Existing watermarks insert a signal into generated tokens that either flags LM-generated text (zero-bit watermarking) or encodes more complex messages (multi-bit watermarking). Though a number of recent multi-bit watermarks insert several bits into text without perturbing average next-token predictions, they largely extend design principles from the zero-bit setting, such as encoding a single bit per token. Notably, the information-theoretic capacity of multi-bit watermarking -- the maximum number of bits per token that can be inserted and detected without changing average next-token predictions -- has remained unknown. We address this gap by deriving the first capacity characterization of multi-bit watermarks. Our results inform the design of ArcMark: a new watermark construction based on coding-theoretic principles that, under certain assumptions, achieves the capacity of the multi-bit watermark channel. In practice, ArcMark outperforms competing multi-bit watermarks in terms of bit rate per token and detection accuracy. Our work demonstrates that LM watermarking is fundamentally a channel coding problem, paving the way for principled coding-theoretic approaches to watermark design.

MLNov 10, 2022

Robust Model Selection of Gaussian Graphical Models

Abrar Zahin, Rajasekhar Anguluri, Lalitha Sankar et al.

In Gaussian graphical model selection, noise-corrupted samples present significant challenges. It is known that even minimal amounts of noise can obscure the underlying structure, leading to fundamental identifiability issues. A recent line of work addressing this "robust model selection" problem narrows its focus to tree-structured graphical models. Even within this specific class of models, exact structure recovery is shown to be impossible. However, several algorithms have been developed that are known to provably recover the underlying tree-structure up to an (unavoidable) equivalence class. In this paper, we extend these results beyond tree-structured graphs. We first characterize the equivalence class up to which general graphs can be recovered in the presence of noise. Despite the inherent ambiguity (which we prove is unavoidable), the structure that can be recovered reveals local clustering information and global connectivity patterns in the underlying model. Such information is useful in a range of real-world problems, including power grids, social networks, protein-protein interactions, and neural structures. We then propose an algorithm which provably recovers the underlying graph up to the identified ambiguity. We further provide finite sample guarantees in the high-dimensional regime for our algorithm and validate our results through numerical simulations.

ITApr 20, 2025

Reveal-or-Obscure: A Differentially Private Sampling Algorithm for Discrete Distributions

Naima Tasnim, Atefeh Gilani, Lalitha Sankar et al.

We introduce a differentially private (DP) algorithm called reveal-or-obscure (ROO) to generate a single representative sample from a dataset of $n$ observations drawn i.i.d. from an unknown discrete distribution $P$. Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves $ε$-differential privacy by randomly choosing whether to "reveal" or "obscure" the empirical distribution. While ROO is structurally identical to Algorithm 1 proposed by Cheu and Nayak (arXiv:2412.10512), we prove a strictly better bound on the sampling complexity than that established in Theorem 12 of (arXiv:2412.10512). To further improve the privacy-utility trade-off, we propose a novel generalized sampling algorithm called Data-Specific ROO (DS-ROO), where the probability of obscuring the empirical distribution of the dataset is chosen adaptively. We prove that DS-ROO satisfies $ε$-DP, and provide empirical evidence that DS-ROO can achieve better utility under the same privacy budget of vanilla ROO.

SYFeb 19, 2024

An Adversarial Approach to Evaluating the Robustness of Event Identification Models

Obai Bahwal, Oliver Kosut, Lalitha Sankar

Intelligent machine learning approaches are finding active use for event detection and identification that allow real-time situational awareness. Yet, such machine learning algorithms have been shown to be susceptible to adversarial attacks on the incoming telemetry data. This paper considers a physics-based modal decomposition method to extract features for event classification and focuses on interpretable classifiers including logistic regression and gradient boosting to distinguish two types of events: load loss and generation loss. The resulting classifiers are then tested against an adversarial algorithm to evaluate their robustness. The adversarial attack is tested in two settings: the white box setting, wherein the attacker knows exactly the classification model; and the gray box setting, wherein the attacker has access to historical data from the same network as was used to train the classifier, but does not know the classification model. Thorough experiments on the synthetic South Carolina 500-bus system highlight that a relatively simpler model such as logistic regression is more susceptible to adversarial attacks than gradient boosting.

LGMay 12, 2024

VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence

Mayank Bakshi, Sara Ghasvarianjahromi, Yauhen Yakimenka et al.

We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data and possible adversarial infiltration. We require (a) convergence to a global empirical loss minimizer when adversaries are absent, and (b) either detection of adversarial presence of convergence to an admissible consensus irrespective of the adversarial configuration. To this end, we propose the VALID protocol which, to the best of our knowledge, is the first to achieve a validated learning guarantee. Moreover, VALID offers an O(1/T) convergence rate (under pertinent regularity assumptions), and computational and communication complexities comparable to non-adversarial distributed stochastic gradient descent. Remarkably, VALID retains optimal performance metrics in adversary-free environments, sidestepping the robustness penalties observed in prior byzantine-robust methods. A distinctive aspect of our study is a heterogeneity metric based on the norms of individual agents' gradients computed at the global empirical loss minimizer. This not only provides a natural statistic for detecting significant byzantine disruptions but also allows us to prove the optimality of VALID in wide generality. Lastly, our numerical results reveal that, in the absence of adversaries, VALID converges faster than state-of-the-art byzantine robust algorithms, while when adversaries are present, VALID terminates with each honest either converging to an admissible consensus of declaring adversarial presence in the network.

SYFeb 14, 2022

A Machine Learning Framework for Event Identification via Modal Analysis of PMU Data

Nima T. Bazargani, Gautam Dasarathy, Lalitha Sankar et al.

Power systems are prone to a variety of events (e.g. line trips and generation loss) and real-time identification of such events is crucial in terms of situational awareness, reliability, and security. Using measurements from multiple synchrophasors, i.e., phasor measurement units (PMUs), we propose to identify events by extracting features based on modal dynamics. We combine such traditional physics-based feature extraction methods with machine learning to distinguish different event types. Including all measurement channels at each PMU allows exploiting diverse features but also requires learning classification models over a high-dimensional space. To address this issue, various feature selection methods are implemented to choose the best subset of features. Using the obtained subset of features, we investigate the performance of two well-known classification models, namely, logistic regression (LR) and support vector machines (SVM) to identify generation loss and line trip events in two datasets. The first dataset is obtained from simulated generation loss and line trip events in the Texas 2000-bus synthetic grid. The second is a proprietary dataset with labeled events obtained from a large utility in the USA involving measurements from nearly 500 PMUs. Our results indicate that the proposed framework is promising for identifying the two types of events.

ITAug 14, 2020

Three Variants of Differential Privacy: Lossless Conversion and Applications

Shahab Asoodeh, Jiachun Liao, Flavio P. Calmon et al.

We consider three different variants of differential privacy (DP), namely approximate DP, Rényi DP (RDP), and hypothesis test DP. In the first part, we develop a machinery for optimally relating approximate DP to RDP based on the joint range of two $f$-divergences that underlie the approximate DP and RDP. In particular, this enables us to derive the optimal approximate DP parameters of a mechanism that satisfies a given level of RDP. As an application, we apply our result to the moments accountant framework for characterizing privacy guarantees of noisy stochastic gradient descent (SGD). When compared to the state-of-the-art, our bounds may lead to about 100 more stochastic gradient descent iterations for training deep learning models for the same privacy budget. In the second part, we establish a relationship between RDP and hypothesis test DP which allows us to translate the RDP constraint into a tradeoff between type I and type II error probabilities of a certain binary hypothesis test. We then demonstrate that for noisy SGD our result leads to tighter privacy guarantees compared to the recently proposed $f$-DP framework for some range of parameters.

ITJan 16, 2020

A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via $f$-Divergences

Shahab Asoodeh, Jiachun Liao, Flavio P. Calmon et al.

We derive the optimal differential privacy (DP) parameters of a mechanism that satisfies a given level of Rényi differential privacy (RDP). Our result is based on the joint range of two $f$-divergences that underlie the approximate and the Rényi variations of differential privacy. We apply our result to the moments accountant framework for characterizing privacy guarantees of stochastic gradient descent. When compared to the state-of-the-art, our bounds may lead to about 100 more stochastic gradient descent iterations for training deep learning models for the same privacy budget.

SYMay 6, 2019

Can Predictive Filters Detect Gradually Ramping False Data Injection Attacks Against PMUs?

Zhigang Chu, Andrea Pinceti, Reetam Sen Biswas et al.

Intelligently designed false data injection (FDI) attacks have been shown to be able to bypass the $χ^2$-test based bad data detector (BDD), resulting in physical consequences (such as line overloads) in the power system. In this paper, it is shown that if an attack is suddenly injected into the system, a predictive filter with sufficient accuracy is able to detect it. However, an attacker can gradually increase the magnitude of the attack to avoid detection, and still cause damage to the system.

SYSep 13, 2018

Unobservable False Data Injection Attacks against PMUs: Feasible Conditions and Multiplicative Attacks

Zhigang Chu, Jiazi Zhang, Oliver Kosut et al.

This paper studies false data injection (FDI) attacks against phasor measurement units (PMUs). As compared to the conventional bad data detector (BDD), an enhanced BDD utilizing the effect of zero injection buses is proposed. Feasible conditions under which FDI attacks are unobservable to this enhanced BDD are discussed. In addition, a class of multiplicative FDI attacks that maintain the rank of the PMU measurement matrix is introduced. Simulation results on the IEEE RTS-24-bus system indicate that the these multiplicative unobservable attacks can avoid detection by both the enhanced BDD and a detector based on low-rank decomposition proposed in prior work.

SYJun 11, 2015

Vulnerability Analysis and Consequences of False Data Injection Attack on Power System State Estimation

Jingwen Liang, Lalitha Sankar, Oliver Kosut

An unobservable false data injection (FDI) attack on AC state estimation (SE) is introduced and its consequences on the physical system are studied. With a focus on understanding the physical consequences of FDI attacks, a bi-level optimization problem is introduced whose objective is to maximize the physical line flows subsequent to an FDI attack on DC SE. The maximization is subject to constraints on both attacker resources (size of attack) and attack detection (limiting load shifts) as well as those required by DC optimal power flow (OPF) following SE. The resulting attacks are tested on a more realistic non-linear system model using AC state estimation and ACOPF, and it is shown that, with an appropriately chosen sub-network, the attacker can overload transmission lines with moderate shifts of load.