Sriram Vishwanath

LG
h-index49
46papers
1,174citations
Novelty56%
AI Score59

46 Papers

LGApr 10, 2023
iDML: Incentivized Decentralized Machine Learning

Haoxiang Yu, Hsiao-Yuan Chen, Sangsu Lee et al.

With the rising emergence of decentralized and opportunistic approaches to machine learning, end devices are increasingly tasked with training deep learning models on-devices using crowd-sourced data that they collect themselves. These approaches are desirable from a resource consumption perspective and also from a privacy preservation perspective. When the devices benefit directly from the trained models, the incentives are implicit - contributing devices' resources are incentivized by the availability of the higher-accuracy model that results from collaboration. However, explicit incentive mechanisms must be provided when end-user devices are asked to contribute their resources (e.g., computation, communication, and data) to a task performed primarily for the benefit of others, e.g., training a model for a task that a neighbor device needs but the device owner is uninterested in. In this project, we propose a novel blockchain-based incentive mechanism for completely decentralized and opportunistic learning architectures. We leverage a smart contract not only for providing explicit incentives to end devices to participate in decentralized learning but also to create a fully decentralized mechanism to inspect and reflect on the behavior of the learning architecture.

DSSep 25, 2014
Generalized Opinion Dynamics from Local Optimization Rules

Avhishek Chatterjee, Anand D. Sarwate, Sriram Vishwanath

We study generalizations of the Hegselmann-Krause (HK) model for opinion dynamics, incorporating features and parameters that are natural components of observed social systems. The first generalization is one where the strength of influence depends on the distance of the agents' opinions. Under this setup, we identify conditions under which the opinions converge in finite time, and provide a qualitative characterization of the equilibrium. We interpret the HK model opinion update rule as a quadratic cost-minimization rule. This enables a second generalization: a family of update rules which possess different equilibrium properties. Subsequently, we investigate models in which a external force can behave strategically to modulate/influence user updates. We consider cases where this external force can introduce additional agents and cases where they can modify the cost structures for other agents. We describe and analyze some strategies through which such modulation may be possible in an order-optimal manner. Our simulations demonstrate that generalized dynamics differ qualitatively and quantitatively from traditional HK dynamics.

16.8LGMay 26
Symbolic Regression via Latent Iterative Refinement

Xieting Chu, Sriram Vishwanath, Vijay Ganesh

Symbolic regression (SR) seeks closed-form mathematical expressions that fit observed data. Neural SR methods amortize the search by training an encoder to map observations directly to expressions in a single pass, but this amortized inference leaves a residual amortization gap between its one-shot prediction and the true posterior. We propose Latent Equation Embedding (LEE), a framework that closes this gap through iterative amortized inference in a functionally grounded latent space. LEE learns a shared latent space Z equipped with three components: an encoder f_theta that jointly embeds symbolic tokens and numerical observations into a single latent vector z; an expression decoder g_expr that reconstructs formulas from z; and an evaluation decoder g_eval that predicts function values from z, explicitly grounding the latent space in functional behavior. At inference, LEE performs iterative refinement by re-encoding decoded expressions jointly with observations, progressively improving the latent estimate. LEE uses the encoder itself as a learned inference optimizer: each re-encoding step implicitly computes the mismatch between the candidate and the data. Because g_eval is differentiable in z, we additionally interleave continuous gradient descent with discrete re-encoding, yielding a hybrid iterative and gradient refinement procedure. On SRBench across three noise levels, against 19 baselines spanning genetic programming, symbolic-neural hybrids, and pre-trained Transformers, LEE produces expressions 2--10x simpler than the strongest accuracy-oriented baselines, including Operon, GP-GOMEA, TPSR, RAG-SR, and GenSR, with complexity 8--11 versus 20--90. These results advance the low-complexity region of the accuracy-complexity Pareto frontier and show graceful degradation as noise increases.

CVJun 21, 2022
Few-Max: Few-Shot Domain Adaptation for Unsupervised Contrastive Representation Learning

Ali Lotfi Rezaabad, Sidharth Kumar, Sriram Vishwanath et al.

Contrastive self-supervised learning methods learn to map data points such as images into non-parametric representation space without requiring labels. While highly successful, current methods require a large amount of data in the training phase. In situations where the target training set is limited in size, generalization is known to be poor. Pretraining on a large source data set and fine-tuning on the target samples is prone to overfitting in the few-shot regime, where only a small number of target samples are available. Motivated by this, we propose a domain adaption method for self-supervised contrastive learning, termed Few-Max, to address the issue of adaptation to a target distribution under few-shot learning. To quantify the representation quality, we evaluate Few-Max on a range of source and target datasets, including ImageNet, VisDA, and fastMRI, on which Few-Max consistently outperforms other approaches.

40.5CVMay 6
ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters

Philippe Hansen-Estruch, Jiahui Chen, Vivek Ramanujan et al.

Vision Transformer (ViT) autoencoders have emerged as compelling tokenizers for images, offering improved reconstruction over convolutional tokenizers. However, existing ViT tokenizers cannot explore this landscape as performance degrades outside training resolutions, and reliance on adversarial losses prevents stable scaling. ViTok (Hansen-Estruch et al., 2025) found that the compression ratio r mediates a reconstruction-generation trade-off where lower r means better reconstructions but harder generations, so improving tokenizer reconstruction is key to more Pareto-optimal tokenizers. We introduce ViTok-v2, which addresses these limitations with native resolution support via NaFlex for generalization across resolutions and aspect ratios, and a novel DINOv3 perceptual loss that replaces both LPIPS and GAN objectives for stable training at any scale. ViTok-v2 is trained on about 2B images and scaled to 5B parameters, the largest image autoencoder to date. ViTok-v2 matches or exceeds state-of-the-art reconstruction at 256p and outperforms all baselines at 512p and above. In joint scaling experiments with flow matching generators, we show that scaling both the autoencoder and the generator advances the Pareto frontier of this trade-off.

CRFeb 13
TensorCommitments: A Lightweight Verifiable Inference for Language Models

Oguzhan Baser, Elahe Sadeghi, Eric Wang et al.

Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM without any adversarial tampering. We critically ask how to achieve verifiable LLM inference, where a prover (the service) must convince a verifier (the client) that an inference was run correctly without rerunning the LLM. Existing cryptographic works are too slow at the LLM scale, while non-cryptographic ones require a strong verifier GPU. We propose TensorCommitments (TCs), a tensor-native proof-of-inference scheme. TC binds the LLM inference to a commitment, an irreversible tag that breaks under tampering, organized in our multivariate Terkle Trees. For LLaMA2, TC adds only 0.97% prover and 0.12% verifier time over inference while improving robustness to tailored LLM attacks by up to 48% over the best prior work requiring a verifier GPU.

CVJun 28, 2025Code
PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection

Oguzhan Baser, Ahmet Ege Tanriverdi, Sriram Vishwanath et al.

Deepfake (DF) attacks pose a growing threat as generative models become increasingly advanced. However, our study reveals that existing DF datasets fail to deceive human perception, unlike real DF attacks that influence public discourse. It highlights the need for more realistic DF attack vectors. We introduce PhonemeFake (PF), a DF attack that manipulates critical speech segments using language reasoning, significantly reducing human perception by up to 42% and benchmark accuracies by up to 94%. We release an easy-to-use PF dataset on HuggingFace and open-source bilevel DF segment detection model that adaptively prioritizes compute on manipulated regions. Our extensive experiments across three known DF datasets reveal that our detection model reduces EER by 91% while achieving up to 90% speed-up, with minimal compute overhead and precise localization beyond existing models as a scalable solution.

CVJun 25, 2024Code
Unified Auto-Encoding with Masked Diffusion

Philippe Hansen-Estruch, Sriram Vishwanath, Amy Zhang et al.

At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled Gaussian corruption process, while masked auto-encoder models do so by masking patches of the image. Despite their different approaches, the underlying similarity in their methodologies suggests a promising avenue for an auto-encoder capable of both de-noising tasks. We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD), that combines patch-based and noise-based corruption techniques within a single auto-encoding framework. Specifically, UMD modifies the diffusion transformer (DiT) training process by introducing an additional noise-free, high masking representation step in the diffusion noising schedule, and utilizes a mixed masked and noised image for subsequent timesteps. By integrating features useful for diffusion modeling and for predicting masked patch tokens, UMD achieves strong performance in downstream generative and representation learning tasks, including linear probing and class-conditional generation. This is achieved without the need for heavy data augmentations, multiple views, or additional encoders. Furthermore, UMD improves over the computational efficiency of prior diffusion based methods in total training time. We release our code at https://github.com/philippe-eecs/small-vision.

CLJun 20, 2024Code
OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Allen Roush, Yusuf Shabazz, Arvind Balaji et al.

We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable resources for training and evaluation. Our extensive experiments demonstrate the efficacy of fine-tuning state-of-the-art large language models for argumentative abstractive summarization across various methods, models, and datasets. By providing this comprehensive resource, we aim to advance computational argumentation and support practical applications for debaters, educators, and researchers. OpenDebateEvidence is publicly available to support further research and innovation in computational argumentation. Access it here: https://huggingface.co/datasets/Yusuf5/OpenCaselist

LGSep 28, 2023
Investigating Human-Identifiable Features Hidden in Adversarial Perturbations

Dennis Y. Menn, Tzu-hsun Feng, Sriram Vishwanath et al.

Neural networks perform exceedingly well across various machine learning tasks but are not immune to adversarial perturbations. This vulnerability has implications for real-world applications. While much research has been conducted, the underlying reasons why neural networks fall prey to adversarial attacks are not yet fully understood. Central to our study, which explores up to five attack algorithms across three datasets, is the identification of human-identifiable features in adversarial perturbations. Additionally, we uncover two distinct effects manifesting within human-identifiable features. Specifically, the masking effect is prominent in untargeted attacks, while the generation effect is more common in targeted attacks. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models. In addition, our findings indicate a notable extent of similarity in perturbations across different attack algorithms when averaged over multiple models. This work also provides insights into phenomena associated with adversarial perturbations, such as transferability and model interpretability. Our study contributes to a deeper understanding of the underlying mechanisms behind adversarial attacks and offers insights for the development of more resilient defense strategies for neural networks.

95.5MLMay 8
Active Multiple-Prediction-Powered Inference

Nicholas Brawand, Nima Leclerc, Anhthy Ngo et al.

Post-deployment monitoring of healthcare AI requires statistically valid, label-efficient methods, but gold-standard labels from clinician chart review are expensive. Prediction-powered inference (PPI) and active statistical inference (ASI) reduce label cost by combining a small labeled sample with abundant model predictions, but both are restricted to a single predictor, a poor fit for modern clinical pipelines that have multiple predictors of differing cost and accuracy available at inference time. We propose Active Multiple-Prediction-Powered Inference (AM-PPI), which routes each instance to a cost-appropriate predictor subset, samples gold-standard labels in proportion to the chosen subset's residual uncertainty, and reweights predictions to minimize estimator variance, all under a single deployment-time budget. AM-PPI generalizes ASI to leverage multiple predictors and extends Multiple-PPI from global per-predictor allocation to per-instance adaptive routing. We derive closed-form Karush-Kuhn-Tucker (KKT) conditions for all three decisions and prove, via biconvexity and strong duality, that the resulting fixed point is a global optimum despite the joint problem being non-jointly-convex. We establish asymptotic normality with valid coverage, minimum-variance unbiasedness within the linear-prediction augmented inverse propensity weighted (AIPW) class, and a closed-form criterion identifying when multiple predictors help. On synthetic data and three healthcare monitoring tasks, AM-PPI produces 10 to 40 percent narrower confidence intervals (CIs) than single-predictor ASI in the budget regime where routing matters, and matches the better baseline elsewhere.

CVJan 16, 2025
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Philippe Hansen-Estruch, David Yan, Ching-Yao Chung et al.

Visual tokenization via auto-encoding empowers state-of-the-art image and video generative models by compressing pixels into a latent space. Although scaling Transformer-based generators has been central to recent advances, the tokenizer component itself is rarely scaled, leaving open questions about how auto-encoder design choices influence both its objective of reconstruction and downstream generative performance. Our work aims to conduct an exploration of scaling in auto-encoders to fill in this blank. To facilitate this exploration, we replace the typical convolutional backbone with an enhanced Vision Transformer architecture for Tokenization (ViTok). We train ViTok on large-scale image and video datasets far exceeding ImageNet-1K, removing data constraints on tokenizer scaling. We first study how scaling the auto-encoder bottleneck affects both reconstruction and generation -- and find that while it is highly correlated with reconstruction, its relationship with generation is more complex. We next explored the effect of separately scaling the auto-encoders' encoder and decoder on reconstruction and generation performance. Crucially, we find that scaling the encoder yields minimal gains for either reconstruction or generation, while scaling the decoder boosts reconstruction but the benefits for generation are mixed. Building on our exploration, we design ViTok as a lightweight auto-encoder that achieves competitive performance with state-of-the-art auto-encoders on ImageNet-1K and COCO reconstruction tasks (256p and 512p) while outperforming existing auto-encoders on 16-frame 128p video reconstruction for UCF-101, all with 2-5x fewer FLOPs. When integrated with Diffusion Transformers, ViTok demonstrates competitive performance on image generation for ImageNet-1K and sets new state-of-the-art benchmarks for class-conditional video generation on UCF-101.

28.6CVApr 5
Interpreting Video Representations with Spatio-Temporal Sparse Autoencoders

Atahan Dokme, Sriram Vishwanath

We present the first systematic study of Sparse Autoencoders (SAEs) on video representations. Standard SAEs decompose video into interpretable, monosemantic features but destroy temporal coherence: hard TopK selection produces unstable feature assignments across frames, reducing autocorrelation by 36%. We propose spatio-temporal contrastive objectives and Matryoshka hierarchical grouping that recover and even exceed raw temporal coherence. The contrastive loss weight controls a tunable trade-off between reconstruction and temporal coherence. A systematic ablation on two backbones and two datasets shows that different configurations excel at different goals: reconstruction fidelity, temporal coherence, action discrimination, or interpretability. Contrastive SAE features improve action classification by +3.9% over raw features and text-video retrieval by up to 2.8xR@1. A cross-backbone analysis reveals that standard monosemanticity metrics contain a backbone-alignment artifact: both DINOv2 and VideoMAE produce equally monosemantic features under neutral (CLIP) similarity. Causal ablation confirms that contrastive training concentrates predictive signal into a small number of identifiable features.

CLFeb 5, 2024
TexShape: Information Theoretic Sentence Embedding for Language Models

Kaan Kale, Homa Esfahanizadeh, Noel Elias et al.

With the exponential growth in data volume and the emergence of data-intensive applications, particularly in the field of machine learning, concerns related to resource utilization, privacy, and fairness have become paramount. This paper focuses on the textual domain of data and addresses challenges regarding encoding sentences to their optimized representations through the lens of information-theory. In particular, we use empirical estimates of mutual information, using the Donsker-Varadhan definition of Kullback-Leibler divergence. Our approach leverages this estimation to train an information-theoretic sentence embedding, called TexShape, for (task-based) data compression or for filtering out sensitive information, enhancing privacy and fairness. In this study, we employ a benchmark language model for initial text representation, complemented by neural networks for information-theoretic compression and mutual information estimations. Our experiments demonstrate significant advancements in preserving maximal targeted information and minimal sensitive information over adverse compression ratios, in terms of predictive accuracy of downstream models that are trained using the compressed data.

MLJan 27
Minimax Rates for Hyperbolic Hierarchical Learning

Divit Rawal, Sriram Vishwanath

We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse, mapping exponentially many tree-distant points to nearby locations. This necessitates Lipschitz constants scaling as $\exp(Ω(R))$ to realize even simple hierarchical targets, yielding exponential sample complexity under capacity control. We then show this obstruction vanishes in hyperbolic space: constant-distortion hyperbolic embeddings admit $O(1)$-Lipschitz realizability, enabling learning with $n = O(mR \log m)$ samples. A matching $Ω(mR \log m)$ lower bound via Fano's inequality establishes that hyperbolic representations achieve the information-theoretic optimum. We also show a geometry-independent bottleneck: any rank-$k$ prediction space captures only $O(k)$ canonical hierarchical contrasts.

LOOct 17, 2025
ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings

Prithwish Jana, Kaan Kale, Ahmet Ege Tanriverdi et al. · gatech

Translating human-written mathematical theorems and proofs from natural language (NL) into formal languages (FLs) like Lean 4 has long been a significant challenge for AI. Most state-of-the-art methods address this separately, first translating theorems and then generating proofs, creating a fundamental disconnect vis-a-vis true proof auto-formalization. This two-step process and its limitations were evident even in AlphaProof's silver-medal performance at the 2024 IMO, where problem statements needed manual translation before automated proof synthesis. We present ProofBridge, a unified framework for automatically translating entire NL theorems and proofs into Lean 4. At its core is a joint embedding model that aligns NL and FL (NL-FL) theorem-proof pairs in a shared semantic space, enabling cross-modal retrieval of semantically relevant FL examples to guide translation. Our training ensures that NL-FL theorems (and their proofs) are mapped close together in this space if and only if the NL-FL pairs are semantically equivalent. ProofBridge integrates retrieval-augmented fine-tuning with iterative proof repair, leveraging Lean's type checker and semantic equivalence feedback to ensure both syntactic correctness and semantic fidelity. Experiments show substantial improvements in proof auto-formalization over strong baselines (including GPT-5, Gemini-2.5, Kimina-Prover, DeepSeek-Prover), with our retrieval-augmented approach yielding significant gains in semantic correctness (SC, via proving bi-directional equivalence) and type correctness (TC, via type-checking theorem+proof) across pass@k metrics on miniF2F-Test-PF, a dataset we curated. In particular, ProofBridge improves cross-modal retrieval quality by up to 3.28x Recall@1 over all-MiniLM-L6-v2, and achieves +31.14% SC and +1.64% TC (pass@32) compared to the baseline Kimina-Prover-RL-1.7B.

SDJun 28, 2025
WavShape: Information-Theoretic Speech Representation Learning for Fair and Privacy-Aware Audio Processing

Oguzhan Baser, Ahmet Ege Tanriverdi, Kaan Kale et al.

Speech embeddings often retain sensitive attributes such as speaker identity, accent, or demographic information, posing risks in biased model training and privacy leakage. We propose WavShape, an information-theoretic speech representation learning framework that optimizes embeddings for fairness and privacy while preserving task-relevant information. We leverage mutual information (MI) estimation using the Donsker-Varadhan formulation to guide an MI-based encoder that systematically filters sensitive attributes while maintaining speech content essential for downstream tasks. Experimental results on three known datasets show that WavShape reduces MI between embeddings and sensitive attributes by up to 81% while retaining 97% of task-relevant information. By integrating information theory with self-supervised speech models, this work advances the development of fair, privacy-aware, and resource-efficient speech systems.

CLOct 28, 2024
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression

Noel Elias, Homa Esfahanizadeh, Kaan Kale et al.

Large language models have drastically changed the prospects of AI by introducing technologies for more complex natural language processing. However, current methodologies to train such LLMs require extensive resources including but not limited to large amounts of data, expensive machinery, and lengthy training. To solve this problem, this paper proposes a new tokenization method inspired by universal Lempel-Ziv-Welch data compression that compresses repetitive phrases into multi-word tokens. With MultiTok as a new tokenizing tool, we show that language models are able to be trained notably more efficiently while offering a similar accuracy on more succinct and compressed training data. In fact, our results demonstrate that MultiTok achieves a comparable performance to the BERT and GPT-2 standards as both a stand-alone tokenizer and an add-on to existing tokenizers while also providing close to 2.5x faster training with more than 30% less training data.

SPMar 2, 2021
Deep J-Sense: Accelerated MRI Reconstruction via Unrolled Alternating Optimization

Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik et al.

Accelerated multi-coil magnetic resonance imaging reconstruction has seen a substantial recent improvement combining compressed sensing with deep learning. However, most of these methods rely on estimates of the coil sensitivity profiles, or on calibration data for estimating model parameters. Prior work has shown that these methods degrade in performance when the quality of these estimators are poor or when the scan parameters differ from the training conditions. Here we introduce Deep J-Sense as a deep learning approach that builds on unrolled alternating minimization and increases robustness: our algorithm refines both the magnetization (image) kernel and the coil sensitivity maps. Experimental results on a subset of the knee fastMRI dataset show that this increases reconstruction performance and provides a significant degree of robustness to varying acceleration factors and calibration region sizes.

LGDec 23, 2020
EQ-Net: A Unified Deep Learning Framework for Log-Likelihood Ratio Estimation and Quantization

Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

In this work, we introduce EQ-Net: the first holistic framework that solves both the tasks of log-likelihood ratio (LLR) estimation and quantization using a data-driven method. We motivate our approach with theoretical insights on two practical estimation algorithms at the ends of the complexity spectrum and reveal a connection between the complexity of an algorithm and the information bottleneck method: simpler algorithms admit smaller bottlenecks when representing their solution. This motivates us to propose a two-stage algorithm that uses LLR compression as a pretext task for estimation and is focused on low-latency, high-performance implementations via deep neural networks. We carry out extensive experimental evaluation and demonstrate that our single architecture achieves state-of-the-art results on both tasks when compared to previous methods, with gains in quantization efficiency as high as $20\%$ and reduced estimation latency by up to $60\%$ when measured on general purpose and graphical processing units (GPU). In particular, our approach reduces the GPU inference latency by more than two times in several multiple-input multiple-output (MIMO) configurations. Finally, we demonstrate that our scheme is robust to distributional shifts and retains a significant part of its performance when evaluated on 5G channel models, as well as channel estimation errors.

LGOct 31, 2020
Hyperbolic Graph Embedding with Enhanced Semi-Implicit Variational Inference

Ali Lotfi Rezaabad, Rahi Kalantari, Sriram Vishwanath et al.

Efficient modeling of relational data arising in physical, social, and information sciences is challenging due to complicated dependencies within the data. In this work, we build off of semi-implicit graph variational auto-encoders to capture higher-order statistics in a low-dimensional graph latent representation. We incorporate hyperbolic geometry in the latent space through a Poincare embedding to efficiently represent graphs exhibiting hierarchical structure. To address the naive posterior latent distribution assumptions in classical variational inference, we use semi-implicit hierarchical variational Bayes to implicitly capture posteriors of given graph data, which may exhibit heavy tails, multiple modes, skewness, and highly correlated latent structures. We show that the existing semi-implicit variational inference objective provably reduces information in the observed graph. Based on this observation, we estimate and add an additional mutual information term to the semi-implicit variational inference learning objective to capture rich correlations arising between the input and latent spaces. We show that the inclusion of this regularization term in conjunction with the Poincare embedding boosts the quality of learned high-level representations and enables more flexible and faithful graphical modeling. We experimentally demonstrate that our approach outperforms existing graph variational auto-encoders both in Euclidean and in hyperbolic spaces for edge link prediction and node classification.

NEJul 9, 2020
Long Short-Term Memory Spiking Networks and Their Applications

Ali Lotfi Rezaabad, Sriram Vishwanath

Recent advances in event-based neuromorphic systems have resulted in significant interest in the use and development of spiking neural networks (SNNs). However, the non-differentiable nature of spiking neurons makes SNNs incompatible with conventional backpropagation techniques. In spite of the significant progress made in training conventional deep neural networks (DNNs), training methods for SNNs still remain relatively poorly understood. In this paper, we present a novel framework for training recurrent SNNs. Analogous to the benefits presented by recurrent neural networks (RNNs) in learning time series models within DNNs, we develop SNNs based on long short-term memory (LSTM) networks. We show that LSTM spiking networks learn the timing of the spikes and temporal dependencies. We also develop a methodology for error backpropagation within LSTM-based SNNs. The developed architecture and method for backpropagation within LSTM-based SNNs enable them to learn long-term dependencies with comparable results to conventional LSTMs.

LGJun 26, 2020
Interpretable Factorization for Neural Network ECG Models

Christopher Snyder, Sriram Vishwanath

The ability of deep learning (DL) to improve the practice of medicine and its clinical outcomes faces a looming obstacle: model interpretation. Without description of how outputs are generated, a collaborating physician can neither resolve when the model's conclusions are in conflict with his or her own, nor learn to anticipate model behavior. Current research aims to interpret networks that diagnose ECG recordings, which has great potential impact as recordings become more personalized and widely deployed. A generalizable impact beyond ECGs lies in the ability to provide a rich test-bed for the development of interpretive techniques in medicine. Interpretive techniques for Deep Neural Networks (DNNs), however, tend to be heuristic and observational in nature, lacking the mathematical rigor one might expect in the analysis of math equations. The motivation of this paper is to offer a third option, a scientific approach. We treat the model output itself as a phenomenon to be explained through component parts and equations governing their behavior. We argue that these component parts should also be "black boxes" --additional targets to interpret heuristically with clear functional connection to the original. We show how to rigorously factor a DNN into a hierarchical equation consisting of black box variables. This is not a subdivision into physical parts, like an organism into its cells; it is but one choice of an equation into a collection of abstract functions. Yet, for DNNs trained to identify normal ECG waveforms on PhysioNet 2017 Challenge data, we demonstrate this choice yields interpretable component models identified with visual composite sketches of ECG samples in corresponding input regions. Moreover, the recursion distills this interpretation: additional factorization of component black boxes corresponds to ECG partitions that are more morphologically pure.

CVJun 5, 2020
Robust Face Verification via Disentangled Representations

Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

We introduce a robust algorithm for face verification, i.e., deciding whether twoimages are of the same person or not. Our approach is a novel take on the idea ofusing deep generative networks for adversarial robustness. We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise. Our architecture uses a contrastive loss termand a disentangled generative model to sample negative pairs. Instead of randomlypairing two real images, we pair an image with its class-modified counterpart whilekeeping its content (pose, head tilt, hair, etc.) intact. This enables us to efficientlysample hard negative pairs for the contrastive loss. We experimentally show that, when coupled with adversarial training, the proposed scheme converges with aweak inner solver and has a higher clean and robust accuracy than state-of-the-art-methods when evaluated against white-box physical attacks.

OCMay 24, 2020
Model-free Reinforcement Learning for Stochastic Stackelberg Security Games

Rajesh K Mishra, Deepanshu Vasal, Sriram Vishwanath

In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader and a follower. The follower has access to the state of the system while the leader does not. Assuming that the players act in their respective best interests, the follower's strategy is to play the best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to a policy which maximizes its own returns given the knowledge that the follower is going to play the best response to its policy. Thus, both players converge to a pair of policies that form the Stackelberg equilibrium of the game. Recently,~[1] provided a sequential decomposition algorithm to compute the Stackelberg equilibrium for such games which allow for the computation of Markovian equilibrium policies in linear time as opposed to double exponential, as before. In this paper, we extend the idea to an MDP whose dynamics are not known to the players, to propose an RL algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm. by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm.

LGMar 25, 2020
Deep Networks as Logical Circuits: Generalization and Interpretation

Christopher Snyder, Sriram Vishwanath

Not only are Deep Neural Networks (DNNs) black box models, but also we frequently conceptualize them as such. We lack good interpretations of the mechanisms linking inputs to outputs. Therefore, we find it difficult to analyze in human-meaningful terms (1) what the network learned and (2) whether the network learned. We present a hierarchical decomposition of the DNN discrete classification map into logical (AND/OR) combinations of intermediate (True/False) classifiers of the input. Those classifiers that can not be further decomposed, called atoms, are (interpretable) linear classifiers. Taken together, we obtain a logical circuit with linear classifier inputs that computes the same label as the DNN. This circuit does not structurally resemble the network architecture, and it may require many fewer parameters, depending on the configuration of weights. In these cases, we obtain simultaneously an interpretation and generalization bound (for the original DNN), connecting two fronts which have historically been investigated separately. Unlike compression techniques, our representation is. We motivate the utility of this perspective by studying DNNs in simple, controlled settings, where we obtain superior generalization bounds despite using only combinatorial information (e.g. no margin information). We demonstrate how to "open the black box" on the MNIST dataset. We show that the learned, internal, logical computations correspond to semantically meaningful (unlabeled) categories that allow DNN descriptions in plain English. We improve the generalization of an already trained network by interpreting, diagnosing, and replacing components the logical circuit that is the DNN.

CVFeb 28, 2020
Detecting Patch Adversarial Attacks with Image Residuals

Marius Arvinte, Ahmed Tewfik, Sriram Vishwanath

We introduce an adversarial sample detection algorithm based on image residuals, specifically designed to guard against patch-based attacks. The image residual is obtained as the difference between an input image and a denoised version of it, and a discriminator is trained to distinguish between clean and adversarial samples. More precisely, we use a wavelet domain algorithm for denoising images and demonstrate that the obtained residuals act as a digital fingerprint for adversarial attacks. To emulate the limitations of a physical adversary, we evaluate the performance of our approach against localized (patch-based) adversarial attacks, including in settings where the adversary has complete knowledge about the detection scheme. Our results show that the proposed detection method generalizes to previously unseen, stronger attacks and that it is able to reduce the success rate (conversely, increase the computational effort) of an adaptive attacker.

LGDec 21, 2019
Learning Representations by Maximizing Mutual Information in Variational Autoencoders

Ali Lotfi Rezaabad, Sriram Vishwanath

Variational autoencoders (VAEs) have ushered in a new era of unsupervised learning methods for complex distributions. Although these techniques are elegant in their approach, they are typically not useful for representation learning. In this work, we propose a simple yet powerful class of VAEs that simultaneously result in meaningful learned representations. Our solution is to combine traditional VAEs with mutual information maximization, with the goal to enhance amortized inference in VAEs using Information Theoretic techniques. We call this approach InfoMax-VAE, and such an approach can significantly boost the quality of learned high-level representations. We realize this through the explicit maximization of information measures associated with the representation. Using extensive experiments on varied datasets and setups, we show that InfoMax-VAE outperforms contemporary popular approaches, including Info-VAE and $β$-VAE.

LGJun 18, 2019
Deep Learning-Based Quantization of L-Values for Gray-Coded Modulation

Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik

In this work, a deep learning-based quantization scheme for log-likelihood ratio (L-value) storage is introduced. We analyze the dependency between the average magnitude of different L-values from the same quadrature amplitude modulation (QAM) symbol and show they follow a consistent ordering. Based on this we design a deep autoencoder that jointly compresses and separately reconstructs each L-value, allowing the use of a weighted loss function that aims to more accurately reconstructs low magnitude inputs. Our method is shown to be competitive with state-of-the-art maximum mutual information quantization schemes, reducing the required memory footprint by a ratio of up to two and a loss of performance smaller than 0.1 dB with less than two effective bits per L-value or smaller than 0.04 dB with 2.25 effective bits. We experimentally show that our proposed method is a universal compression scheme in the sense that after training on an LDPC-coded Rayleigh fading scenario we can reuse the same network without further training on other channel models and codes while preserving the same performance benefits.

LGMar 11, 2019
Deep Log-Likelihood Ratio Quantization

Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is trained to compress, quantize and reconstruct the bit log-likelihood ratios corresponding to a single transmitted symbol. Specifically, the encoder maps to a latent space with dimension equal to the number of sufficient statistics required to recover the inputs - equal to three in this case - while the decoder aims to reconstruct a noisy version of the latent representation with the purpose of modeling quantization effects in a differentiable way. Simulation results show that, when applied to a standard rate-1/2 low-density parity-check (LDPC) code, a finite precision compression factor of nearly three times is achieved when storing an entire codeword, with an incurred loss of performance lower than 0.1 dB compared to straightforward scalar quantization of the log-likelihood ratios.

CRJan 31, 2019
HashCore: Proof-of-Work Functions for General Purpose Processors

Yanni Georghiades, Steven Flolid, Sriram Vishwanath

Over the past five years, the rewards associated with mining Proof-of-Work blockchains have increased substantially. As a result, miners are heavily incentivized to design and utilize Application Specific Integrated Circuits (ASICs) that can compute hashes far more efficiently than existing general purpose hardware. Currently, it is difficult for most users to purchase and operate ASICs due to pricing and availability constraints, resulting in a relatively small number of miners with respect to total user base for most popular cryptocurrencies. In this work, we aim to invert the problem of ASIC development by constructing a Proof-of-Work function for which an existing general purpose processor (GPP, such as an x86 IC) is already an optimized ASIC. In doing so, we will ensure that any would-be miner either already owns an ASIC for the Proof-of-Work system they wish to participate in or can attain one at a competitive price with relative ease. In order to achieve this, we present HashCore, a Proof-of-Work function composed of "widgets" generated pseudo-randomly at runtime that each execute a sequence of general purpose processor instructions designed to stress the computational resources of such a GPP. The widgets will be modeled after workloads that GPPs have been optimized for, for example, the SPEC CPU 2017 benchmark suite for x86 ICs, in a technique we refer to as inverted benchmarking. We provide a proof that HashCore is collision-resistant regardless of how the widgets are implemented. We observe that GPP designers/developers essentially create an ASIC for benchmarks such as SPEC CPU 2017. By modeling HashCore after such benchmarks, we create a Proof-of-Work function that can be run most efficiently on a GPP, resulting in a more accessible, competitive, and balanced mining market.

LGNov 5, 2018
Sample Compression, Support Vectors, and Generalization in Deep Learning

Christopher Snyder, Sriram Vishwanath

Even though Deep Neural Networks (DNNs) are widely celebrated for their practical performance, they possess many intriguing properties related to depth that are difficult to explain both theoretically and intuitively. Understanding how weights in deep networks coordinate together across layers to form useful learners has proven challenging, in part because the repeated composition of nonlinearities has proved intractable. This paper presents a reparameterization of DNNs as a linear function of a feature map that is locally independent of the weights. This feature map transforms depth-dependencies into simple tensor products and maps each input to a discrete subset of the feature space. Then, using a max-margin assumption, the paper develops a sample compression representation of the neural network in terms of the discrete activation state of neurons induced by s ``support vectors". The paper shows that the number of support vectors s relates with learning guarantees for neural networks through sample compression bounds, yielding a sample complexity of O(ns/epsilon) for networks with n neurons. Finally, the number of support vectors s is found to monotonically increase with width and label noise but decrease with depth.

LGOct 28, 2018
Experimental Design for Cost-Aware Learning of Causal Graphs

Erik M. Lindgren, Murat Kocaoglu, Alexandros G. Dimakis et al.

We consider the minimum cost intervention design problem: Given the essential graph of a causal graph and a cost to intervene on a variable, identify the set of interventions with minimum total cost that can learn any causal graph with the given essential graph. We first show that this problem is NP-hard. We then prove that we can achieve a constant factor approximation to this problem with a greedy algorithm. We then constrain the sparsity of each intervention. We develop an algorithm that returns an intervention design that is nearly optimal in terms of size for sparse graphs with sparse interventions and we discuss how to use it when there are costs on the vertices.

MLJul 26, 2018
Applications of Common Entropy for Causal Inference

Murat Kocaoglu, Sanjay Shakkottai, Alexandros G. Dimakis et al.

We study the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. The minimum entropy required for such a latent is known as common entropy in information theory. We extend this notion to Renyi common entropy by minimizing the Renyi entropy of the latent variable. To efficiently compute common entropy, we propose an iterative algorithm that can be used to discover the trade-off between the entropy of the latent variable and the conditional mutual information of the observed variables. We show two applications of common entropy in causal inference: First, under the assumption that there are no low-entropy mediators, it can be used to distinguish causation from spurious correlation among almost all joint distributions on simple causal graphs with two observed variables. Second, common entropy can be used to improve constraint-based methods such as PC or FCI algorithms in the small-sample regime, where these methods are known to struggle. We propose a modification to these constraint-based methods to assess if a separating set found by these algorithms is valid using common entropy. We finally evaluate our algorithms on synthetic and real data to establish their performance.

MLJun 17, 2018
Compressed Sensing with Deep Image Prior and Learned Regularization

Dave Van Veen, Ajil Jalal, Mahdi Soltanolkotabi et al.

We propose a novel method for compressed sensing recovery using untrained deep generative models. Our method is based on the recently proposed Deep Image Prior (DIP), wherein the convolutional weights of the network are optimized to match the observed measurements. We show that this approach can be applied to solve any differentiable linear inverse problem, outperforming previous unlearned methods. Unlike various learned approaches based on generative models, our method does not require pre-training over large datasets. We further introduce a novel learned regularization technique, which incorporates prior information on the network weights. This reduces reconstruction error, especially for noisy measurements. Finally, we prove that, using the DIP optimization approach, moderately overparameterized single-layer networks can perfectly fit any signal despite the non-convex nature of the fitting problem. This theoretical result provides justification for early stopping.

LGSep 6, 2017
CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis et al.

We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a two-stage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset.

AIMar 8, 2017
Cost-Optimal Learning of Causal Graphs

Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath

We consider the problem of learning a causal graph over a set of variables with interventions. We study the cost-optimal causal graph learning problem: For a given skeleton (undirected version of the causal graph), design the set of interventions with minimum total cost, that can uniquely identify any causal graph with the given skeleton. We show that this problem is solvable in polynomial time. Later, we consider the case when the number of interventions is limited. For this case, we provide polynomial time algorithms when the skeleton is a tree or a clique tree. For a general chordal skeleton, we develop an efficient greedy algorithm, which can be improved when the causal graph skeleton is an interval graph.

ITJan 28, 2017
Entropic Causality and Greedy Minimum Entropy Coupling

Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath et al.

We study the problem of identifying the causal relationship between two discrete random variables from observational data. We recently proposed a novel framework called entropic causality that works in a very general functional model but makes the assumption that the unobserved exogenous variable has small entropy in the true causal direction. This framework requires the solution of a minimum entropy coupling problem: Given marginal distributions of m discrete random variables, each on n states, find the joint distribution with minimum entropy, that respects the given marginals. This corresponds to minimizing a concave function of nm variables over a convex polytope defined by nm linear constraints, called a transportation polytope. Unfortunately, it was recently shown that this minimum entropy coupling problem is NP-hard, even for 2 variables with n states. Even representing points (joint distributions) over this space can require exponential complexity (in n, m) if done naively. In our recent work we introduced an efficient greedy algorithm to find an approximate solution for this problem. In this paper we analyze this algorithm and establish two results: that our algorithm always finds a local minimum and also is within an additive approximation error from the unknown global optimum.

AINov 12, 2016
Entropic Causal Inference

Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath et al.

We consider the problem of identifying the causal direction between two discrete random variables using observational data. Unlike previous work, we keep the most general functional model but make an assumption on the unobserved exogenous variable: Inspired by Occam's razor, we assume that the exogenous variable is simple in the true causal direction. We quantify simplicity using Rényi entropy. Our main result is that, under natural assumptions, if the exogenous variable has low $H_0$ entropy (cardinality) in the true direction, it must have high $H_0$ entropy in the wrong direction. We establish several algorithmic hardness results about estimating the minimum entropy exogenous variable. We show that the problem of finding the exogenous variable with minimum entropy is equivalent to the problem of finding minimum joint entropy given $n$ marginal distributions, also known as minimum entropy coupling problem. We propose an efficient greedy algorithm for the minimum entropy coupling problem, that for $n=2$ provably finds a local optimum. This gives a greedy algorithm for finding the exogenous variable with minimum $H_1$ (Shannon Entropy). Our greedy entropy-based causal inference algorithm has similar performance to the state of the art additive noise models in real datasets. One advantage of our approach is that we make no use of the values of random variables but only their distributions. Our method can therefore be used for causal inference for both ordinal and also categorical data, unlike additive noise models.

AIOct 30, 2015
Learning Causal Graphs with Small Interventions

Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis et al.

We consider the problem of learning causal networks with interventions, when each intervention is limited in size under Pearl's Structural Equation Model with independent errors (SEM-IE). The objective is to minimize the number of experiments to discover the causal directions of all the edges in a causal graph. Previous work has focused on the use of separating systems for complete graphs for this task. We prove that any deterministic adaptive algorithm needs to be a separating system in order to learn complete graphs in the worst case. In addition, we present a novel separating system construction, whose size is close to optimal and is arguably simpler than previous work in combinatorics. We also develop a novel information theoretic lower bound on the number of interventions that applies in full generality, including for randomized adaptive learning algorithms. For general chordal graphs, we derive worst case lower bounds on the number of interventions. Building on observations about induced trees, we give a new deterministic adaptive algorithm to learn directions on any chordal skeleton completely. In the worst case, our achievable scheme is an $α$-approximation algorithm where $α$ is the independence number of the graph. We also show that there exist graph classes for which the sufficient number of experiments is close to the lower bound. In the other extreme, there are graph classes for which the required number of experiments is multiplicatively $α$ away from our lower bound. In simulations, our algorithm almost always performs very close to the lower bound, while the approach based on separating systems for complete graphs is significantly worse for random chordal graphs.

CRSep 15, 2015
Jamming aided Generalized Data Attacks: Exposing Vulnerabilities in Secure Estimation

Deepjyoti Deka, Ross Baldick, Sriram Vishwanath

Jamming refers to the deletion, corruption or damage of meter measurements that prevents their further usage. This is distinct from adversarial data injection that changes meter readings while preserving their utility in state estimation. This paper presents a generalized attack regime that uses jamming of secure and insecure measurements to greatly expand the scope of common 'hidden' and 'detectable' data injection attacks in literature. For 'hidden' attacks, it is shown that with jamming, the optimal attack is given by the minimum feasible cut in a specific weighted graph. More importantly, for 'detectable' data attacks, this paper shows that the entire range of relative costs for adversarial jamming and data injection can be divided into three separate regions, with distinct graph-cut based constructions for the optimal attack. Approximate algorithms for attack design are developed and their performances are demonstrated by simulations on IEEE test cases. Further, it is proved that prevention of such attacks require security of all grid measurements. This work comprehensively quantifies the dual adversarial benefits of jamming: (a) reduced attack cost and (b) increased resilience to secure measurements, that strengthen the potency of data attacks.

SYSep 24, 2015
Structural Vulnerability of Power Grids to Disasters: Bounds, Adversarial Attacks and Reinforcement

Deepjyoti Deka, Sriram Vishwanath

Natural Disasters like hurricanes, floods or earthquakes can damage power grid devices and create cascading blackouts and islands. The nature of failure propagation and extent of damage is dependent on the structural features of the grid, which is different from that of random networks. This paper analyzes the structural vulnerability of real power grids to impending disasters and presents intuitive graphical metrics to quantify the extent of damage. Two improved graph eigen-value based bounds on the grid vulnerability are developed and demonstrated through simulations of failure propagation on IEEE test cases and real networks. Finally this paper studies adversarial attacks aimed at weakening the grid's structural resilience and presents two approximate schemes to determine the critical transmission lines that may be attacked to minimize grid resilience. The framework can be also be used to design protection schemes to secure the grid against such adversarial attacks. Simulations on power networks are used to compare the performance of the attack schemes in reducing grid resilience.

CRJun 15, 2015
Optimal Data Attacks on Power Grids: Leveraging Detection & Measurement Jamming

Deepjyoti Deka, Ross Baldick, Sriram Vishwanath

Meter measurements in the power grid are susceptible to manipulation by adversaries, that can lead to errors in state estimation. This paper presents a general framework to study attacks on state estimation by adversaries capable of injecting bad-data into measurements and further, of jamming their reception. Through these two techniques, a novel `detectable jamming' attack is designed that changes the state estimation despite failing bad-data detection checks. Compared to commonly studied `hidden' data attacks, these attacks have lower costs and a wider feasible operating region. It is shown that the entire domain of jamming costs can be divided into two regions, with distinct graph-cut based formulations for the design of the optimal attack. The most significant insight arising from this result is that the adversarial capability to jam measurements changes the optimal 'detectable jamming' attack design only if the jamming cost is less than half the cost of bad-data injection. A polynomial time approximate algorithm for attack vector construction is developed and its efficacy in attack design is demonstrated through simulations on IEEE test systems.

CRJun 13, 2015
One Breaker is Enough: Hidden Topology Attacks on Power Grids

Deepjyoti Deka, Ross Baldick, Sriram Vishwanath

A coordinated cyber-attack on grid meter readings and breaker statuses can lead to incorrect state estimation that can subsequently destabilize the grid. This paper studies cyber-attacks by an adversary that changes breaker statuses on transmission lines to affect the estimation of the grid topology. The adversary, however, is incapable of changing the value of any meter data and can only block recorded measurements on certain lines from being transmitted to the control center. The proposed framework, with limited resource requirements as compared to standard data attacks, thus extends the scope of cyber-attacks to grids secure from meter corruption. We discuss necessary and sufficient conditions for feasible attacks using a novel graph-coloring based analysis and show that an optimal attack requires breaker status change at only ONE transmission line. The potency of our attack regime is demonstrated through simulations on IEEE test cases.

CRMay 7, 2015
Data Attacks on Power Grids: Leveraging Detection

Deepjyoti Deka, Ross Baldick, Sriram Vishwanath

Data attacks on meter measurements in the power grid can lead to errors in state estimation. This paper presents a new data attack model where an adversary produces changes in state estimation despite failing bad-data detection checks. The adversary achieves its objective by making the estimator incorrectly identify correct measurements as bad data. The proposed attack regime's significance lies in reducing the minimum sizes of successful attacks to more than half of that of undetectable data attacks. Additionally, the attack model is able to construct attacks on systems that are resilient to undetectable attacks. The conditions governing a successful data attack of the proposed model are presented along with guarantees on its performance. The complexity of constructing an optimal attack is discussed and two polynomial time approximate algorithms for attack vector construction are developed. The performance of the proposed algorithms and efficacy of the hidden attack model are demonstrated through simulations on IEEE test systems.

CRJan 14, 2014
Hidden Attacks on Power Grid: Optimal Attack Strategies and Mitigation

Deepjyoti Deka, Ross Baldick, Sriram Vishwanath

Real time operation of the power grid and synchronism of its different elements require accurate estimation of its state variables. Errors in state estimation will lead to sub-optimal Optimal Power Flow (OPF) solutions and subsequent increase in the price of electricity in the market or, potentially overload and create line outages. This paper studies hidden data attacks on power systems by an adversary trying to manipulate state estimators. The adversary gains control of a few meters, and is able to introduce spurious measurements in them. The paper presents a polynomial time algorithm using min-cut calculations to determine the minimum number of measurements an adversary needs to manipulate in order to perform a hidden attack. Greedy techniques are presented to aid the system operator in identifying critical measurements for protection to prevent such hidden data attacks. Secure PMU placement against data attacks is also discussed and an algorithm for placing PMUs for this purpose is developed. The performances of the proposed algorithms are shown through simulations on IEEE test cases.