Ananthram Swami

h-index66

50papers

12,770citations

Novelty56%

AI Score60

Ranked #7,971 of 205,806 authors (top 4%)#1,551 in LG (top 4%)

50 Papers

SPApr 2, 2023

Deep Graph Unfolding for Beamforming in MU-MIMO Interference Networks

Arindam Chowdhury, Gunjan Verma, Ananthram Swami et al.

We develop an efficient and near-optimal solution for beamforming in multi-user multiple-input-multiple-output single-hop wireless ad-hoc interference networks. Inspired by the weighted minimum mean squared error (WMMSE) method, a classical approach to solving this problem, and the principle of algorithm unfolding, we present unfolded WMMSE (UWMMSE) for MU-MIMO. This method learns a parameterized functional transformation of key WMMSE parameters using graph neural networks (GNNs), where the channel and interference components of a wireless network constitute the underlying graph. These GNNs are trained through gradient descent on a network utility metric using multiple instances of the beamforming problem. Comprehensive experimental analyses illustrate the superiority of UWMMSE over the classical WMMSE and state-of-the-art learning-based methods in terms of performance, generalizability, and robustness.

SPNov 19, 2022

Delay-aware Backpressure Routing Using Graph Neural Networks

Zhongyuan Zhao, Bojan Radojicic, Gunjan Verma et al.

We propose a throughput-optimal biased backpressure (BP) algorithm for routing, where the bias is learned through a graph neural network that seeks to minimize end-to-end delay. Classical BP routing provides a simple yet powerful distributed solution for resource allocation in wireless multi-hop networks but has poor delay performance. A low-cost approach to improve this delay performance is to favor shorter paths by incorporating pre-defined biases in the BP computation, such as a bias based on the shortest path (hop) distance to the destination. In this work, we improve upon the widely-used metric of hop distance (and its variants) for the shortest path bias by introducing a bias based on the link duty cycle, which we predict using a graph convolutional neural network. Numerical results show that our approach can improve the delay performance compared to classical BP and existing BP alternatives based on pre-defined bias while being adaptive to interference density. In terms of complexity, our distributed implementation only introduces a one-time overhead (linear in the number of devices in the network) compared to classical BP, and a constant overhead compared to the lowest-complexity existing bias-based BP algorithms.

OCJul 24, 2018

Spread, then Target, and Advertise in Waves: Optimal Budget Allocation Across Advertising Channels

Soheil Eshghi, Victor M. Preciado, Saswati Sarkar et al.

We analyze optimal strategies for the allocation of a finite budget that can be invested in different advertising channels over time with the objective of influencing social opinions in a network of individuals. In our analysis, we consider both exogenous influence mechanisms, such as advertising campaigns, as well as endogenous mechanisms of social influence, such as word-of-mouth and peer-pressure, which are modeled using diffusion dynamics. We show that for a broad family of objective functions, the optimal influence strategy at every time uses all channels at either their maximum rate or not at all, i.e., a bang-bang strategy. Furthermore, we prove that the number of switches between these extremes is bounded above by a term that is typically much smaller than the number of agents. This means that the optimal influence strategy is to exert maximum effort in waves for every channel, and then cease effort and let the effects propagate. We also show that, at the beginning of the campaign, the total cost-adjusted reach of an exogenous advertising channel determines its relative value. In contrast, as we approach our investment horizon (e.g., election day), the optimal strategy is to invest in channels able to target individuals instead of broad-reaching channels. We demonstrate that the optimal influence strategies are easily computable in several practical cases, and explicitly characterize the optimal controls for the case of linear objective functions in closed form. Finally, we see that, in the canonical example of designing an election campaign, identifying late-deciders is a critical component in the optimal design.

NIJun 11, 2023

Learnable Digital Twin for Efficient Wireless Network Evaluation

Boning Li, Timofey Efimov, Abhishek Kumar et al.

Network digital twins (NDTs) facilitate the estimation of key performance indicators (KPIs) before physically implementing a network, thereby enabling efficient optimization of the network configuration. In this paper, we propose a learning-based NDT for network simulators. The proposed method offers a holistic representation of information flow in a wireless network by integrating node, edge, and path embeddings. Through this approach, the model is trained to map the network configuration to KPIs in a single forward pass. Hence, it offers a more efficient alternative to traditional simulation-based methods, thus allowing for rapid experimentation and optimization. Our proposed method has been extensively tested through comprehensive experimentation in various scenarios, including wired and wireless networks. Results show that it outperforms baseline learning models in terms of accuracy and robustness. Moreover, our approach achieves comparable performance to simulators but with significantly higher computational efficiency.

NIJul 13, 2024

Biased Backpressure Routing Using Link Features and Graph Neural Networks

Zhongyuan Zhao, Bojan Radojičić, Gunjan Verma et al.

To reduce the latency of Backpressure (BP) routing in wireless multi-hop networks, we propose to enhance the existing shortest path-biased BP (SP-BP) and sojourn time-based backlog metrics, since they introduce no additional time step-wise signaling overhead to the basic BP. Rather than relying on hop-distance, we introduce a new edge-weighted shortest path bias built on the scheduling duty cycle of wireless links, which can be predicted by a graph convolutional neural network based on the topology and traffic of wireless networks. Additionally, we tackle three long-standing challenges associated with SP-BP: optimal bias scaling, efficient bias maintenance, and integration of delay awareness. Our proposed solutions inherit the throughput optimality of the basic BP, as well as its practical advantages of low complexity and fully distributed implementation. Our approaches rely on common link features and introduces only a one-time constant overhead to previous SP-BP schemes, or a one-time overhead linear in the network size to the basic BP. Numerical experiments show that our solutions can effectively address the major drawbacks of slow startup, random walk, and the last packet problem in basic BP, improving the end-to-end delay of existing low-overhead BP algorithms under various settings of network traffic, interference, and mobility.

LGApr 18, 2023

Learning to Transmit with Provable Guarantees in Wireless Federated Learning

Boning Li, Jake Perazzone, Ananthram Swami et al.

We propose a novel data-driven approach to allocate transmit power for federated learning (FL) over interference-limited wireless networks. The proposed method is useful in challenging scenarios where the wireless channel is changing during the FL training process and when the training data are not independent and identically distributed (non-i.i.d.) on the local devices. Intuitively, the power policy is designed to optimize the information received at the server end during the FL process under communication constraints. Ultimately, our goal is to improve the accuracy and efficiency of the global FL model being trained. The proposed power allocation policy is parameterized using graph convolutional networks (GCNs), and the associated constrained optimization problem is solved through a primal-dual (PD) algorithm. Theoretically, we show that the formulated problem has a zero duality gap and, once the power policy is parameterized, optimality depends on how expressive this parameterization is. Numerically, we demonstrate that the proposed method outperforms existing baselines under different wireless channel settings and varying degrees of data heterogeneity.

SPMar 27, 2022

Distributed Link Sparsification for Scalable Scheduling Using Graph Neural Networks

Zhongyuan Zhao, Ananthram Swami, Santiago Segarra

Distributed scheduling algorithms for throughput or utility maximization in dense wireless multi-hop networks can have overwhelmingly high overhead, causing increased congestion, energy consumption, radio footprint, and security vulnerability. For wireless networks with dense connectivity, we propose a distributed scheme for link sparsification with graph convolutional networks (GCNs), which can reduce the scheduling overhead while keeping most of the network capacity. In a nutshell, a trainable GCN module generates node embeddings as topology-aware and reusable parameters for a local decision mechanism, based on which a link can withdraw itself from the scheduling contention if it is not likely to win. In medium-sized wireless networks, our proposed sparse scheduler beats classical threshold-based sparsification policies by retaining almost $70\%$ of the total capacity achieved by a distributed greedy max-weight scheduler with $0.4\%$ of the point-to-point message complexity and $2.6\%$ of the average number of interfering neighbors per link.

CRJul 31, 2024

Resilience and Security of Deep Neural Networks Against Intentional and Unintentional Perturbations: Survey and Research Challenges

Sazzad Sayyed, Milin Zhang, Shahriar Rifat et al.

In order to deploy deep neural networks (DNNs) in high-stakes scenarios, it is imperative that DNNs provide inference robust to external perturbations - both intentional and unintentional. Although the resilience of DNNs to intentional and unintentional perturbations has been widely investigated, a unified vision of these inherently intertwined problem domains is still missing. In this work, we fill this gap by providing a survey of the state of the art and highlighting the similarities of the proposed approaches.We also analyze the research challenges that need to be addressed to deploy resilient and secure DNNs. As there has not been any such survey connecting the resilience of DNNs to intentional and unintentional perturbations, we believe this work can help advance the frontier in both domains by enabling the exchange of ideas between the two communities.

SYSep 26, 2011

Optimal Sensor Placement for Intruder Detection

Waseem A. Malik, Nuno C. Martins, Ananthram Swami

We consider the centralized detection of an intruder, whose location is modeled as uniform across a specified set of points, using an optimally placed team of sensors. These sensors make conditionally independent observations. The local detectors at the sensors are also assumed to be identical, with detection probability $(P_{_{D}})$ and false alarm probability $(P_{_{F}})$. We formulate the problem as an N-ary hypothesis testing problem, jointly optimizing the sensor placement and detection policies at the fusion center. We prove that uniform sensor placement is never strictly optimal when the number of sensors $(M)$ equals the number of placement points $(N)$. We prove that for $N_{2} > N_{1} > M$, where $N_{1},N_{2}$ are number of placement points, the framework utilizing $M$ sensors and $N_{1}$ placement points has the same optimal placement structure as the one utilizing $M$ sensors and $N_{2}$ placement points. For $M\leq 5$ and for fixed $P_{_{D}}$, increasing $P_{_{F}}$ leads to optimal placements that are higher in the majorization-based placement scale. Similarly for $M\leq 5$ and for fixed $P_{_{F}}$, increasing $P_{_{D}}$ leads to optimal placements that are higher in the majorization-based placement scale. For $M>5$, this result does not necessarily hold and we provide a simple counterexample. It is conjectured that the set of optimal placements for a given $(M,N)$ can always be placed on a majorization-based placement scale.

85.6LGMar 20

Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret

Ming Shi, Yingbin Liang, Ness B. Shroff et al.

Reinforcement learning from human feedback (RLHF) replaces hard-to-specify rewards with pairwise trajectory preferences, yet regret-oriented theory often assumes that preference labels are generated consistently from a single ground-truth objective. In practical RLHF systems, however, feedback is typically \emph{multi-source} (annotators, experts, reward models, heuristics) and can exhibit systematic, persistent mismatches due to subjectivity, expertise variation, and annotation/modeling artifacts. We study episodic RL from \emph{multi-source imperfect preferences} through a cumulative imperfection budget: for each source, the total deviation of its preference probabilities from an ideal oracle is at most $Ï$ over $K$ episodes. We propose a unified algorithm with regret $\tilde{O}(\sqrt{K/M}+Ï)$, which exhibits a best-of-both-regimes behavior: it achieves $M$-dependent statistical gains when imperfection is small (where $M$ is the number of sources), while remaining robust with unavoidable additive dependence on $Ï$ when imperfection is large. We complement this with a lower bound $\tildeÎ©(\max\{\sqrt{K/M},Ï\})$, which captures the best possible improvement with respect to $M$ and the unavoidable dependence on $Ï$, and a counterexample showing that naÃ¯vely treating imperfect feedback as as oracle-consistent can incur regret as large as $\tildeÎ©(\min\{Ï\sqrt{K},K\})$. Technically, our approach involves imperfection-adaptive weighted comparison learning, value-targeted transition estimation to control hidden feedback-induced distribution shift, and sub-importance sampling to keep the weighted objectives analyzable, yielding regret guarantees that quantify when multi-source feedback provably improves RLHF and how cumulative imperfection fundamentally limits it.

NIDec 11, 2025

A Differentiable Digital Twin of Distributed Link Scheduling for Contention-Aware Networking

Zhongyuan Zhao, Yujun Ming, Kevin Chan et al.

Many routing and flow optimization problems in wired networks can be solved efficiently using minimum cost flow formulations. However, this approach does not extend to wireless multi-hop networks, where the assumptions of fixed link capacity and linear cost structure collapse due to contention for shared spectrum resources. The key challenge is that the long-term capacity of a wireless link becomes a non-linear function of its network context, including network topology, link quality, and the traffic assigned to neighboring links. In this work, we pursue a new direction of modeling wireless network under randomized medium access control by developing an analytical network digital twin (NDT) that predicts link duty cycles from network context. We generalize randomized contention as finding a Maximal Independent Set (MIS) on the conflict graph using weighted Luby's algorithm, derive an analytical model of link duty cycles, and introduce an iterative procedure that resolves the circular dependency among duty cycle, link capacity, and contention probability. Our numerical experiments show that the proposed NDT accurately predicts link duty cycles and congestion patterns with up to a 5000x speedup over packet-level simulation, and enables us to optimize link scheduling using gradient descent for reduced congestion and radio footprint.

LGJan 29

FlowSymm: Physics Aware, Symmetry Preserving Graph Attention for Network Flow Completion

Ege Demirci, Francesco Bullo, Ananthram Swami et al.

Recovering missing flows on the edges of a network, while exactly respecting local conservation laws, is a fundamental inverse problem that arises in many systems such as transportation, energy, and mobility. We introduce FlowSymm, a novel architecture that combines (i) a group-action on divergence-free flows, (ii) a graph-attention encoder to learn feature-conditioned weights over these symmetry-preserving actions, and (iii) a lightweight Tikhonov refinement solved via implicit bilevel optimization. The method first anchors the given observation on a minimum-norm divergence-free completion. We then compute an orthonormal basis for all admissible group actions that leave the observed flows invariant and parameterize the valid solution subspace, which shows an Abelian group structure under vector addition. A stack of GATv2 layers then encodes the graph and its edge features into per-edge embeddings, which are pooled over the missing edges and produce per-basis attention weights. This attention-guided process selects a set of physics-aware group actions that preserve the observed flows. Finally, a scalar Tikhonov penalty refines the missing entries via a convex least-squares solver, with gradients propagated implicitly through Cholesky factorization. Across three real-world flow benchmarks (traffic, power, bike), FlowSymm outperforms state-of-the-art baselines in RMSE, MAE and correlation metrics.

CVMar 4

SPRINT: Semi-supervised Prototypical Representation for Few-Shot Class-Incremental Tabular Learning

Umid Suleymanov, Murat Kantarcioglu, Kevin S Chan et al.

Real-world systems must continuously adapt to novel concepts from limited data without forgetting previously acquired knowledge. While Few-Shot Class-Incremental Learning (FSCIL) is established in computer vision, its application to tabular domains remains largely unexplored. Unlike images, tabular streams (e.g., logs, sensors) offer abundant unlabeled data, a scarcity of expert annotations and negligible storage costs, features ignored by existing vision-based methods that rely on restrictive buffers. We introduce SPRINT, the first FSCIL framework tailored for tabular distributions. SPRINT introduces a mixed episodic training strategy that leverages confidence-based pseudo-labeling to enrich novel class representations and exploits low storage costs to retain base class history. Extensive evaluation across six diverse benchmarks spanning cybersecurity, healthcare, and ecological domains, demonstrates SPRINT's cross-domain robustness. It achieves a state-of-the-art average accuracy of 77.37% (5-shot), outperforming the strongest incremental baseline by 4.45%.

SIJun 11, 2023

Deep Demixing: Reconstructing the Evolution of Network Epidemics

Boning Li, Gojko Čutura, Ananthram Swami et al.

We propose the deep demixing (DDmix) model, a graph autoencoder that can reconstruct epidemics evolving over networks from partial or aggregated temporal information. Assuming knowledge of the network topology but not of the epidemic model, our goal is to estimate the complete propagation path of a disease spread. A data-driven approach is leveraged to overcome the lack of model awareness. To solve this inverse problem, DDmix is proposed as a graph conditional variational autoencoder that is trained from past epidemic spreads. DDmix seeks to capture key aspects of the underlying (unknown) spreading dynamics in its latent space. Using epidemic spreads simulated in synthetic and real-world networks, we demonstrate the accuracy of DDmix by comparing it with multiple (non-graph-aware) learning algorithms. The generalizability of DDmix is highlighted across different types of networks. Finally, we showcase that a simple post-processing extension of our proposed method can help identify super-spreaders in the reconstructed propagation path.

45.6NIMay 22

Ant Backpressure Routing for Dynamic Wireless Multi-hop Networks with Mixed Traffic Patterns

Negar Erfaniantaghvayi, Zhongyuan Zhao, Kevin Chan et al.

Backpressure (BP) routing and its shortest-path biased variant (SP-BP) provide powerful congestion-aware multipath resource allocation for wireless multi-hop networks, but they rely on per-commodity queueing and slot-by-slot control that may be difficult to realize under practical or legacy forwarding architectures. Moreover, even state-of-the-art SP-BP still suffers from the last-packet problem when short-lived traffic coexists with streaming flows. To address these limitations, we propose Ant Backpressure (Ant-BP), a periodic and fully distributed routing scheme that decouples route learning from packet forwarding. Ant-BP uses virtual SP-BP to construct pheromone-based forwarding probabilities, while actual packets are forwarded through per-neighbor first-in-first-out (FIFO) queues with probabilistic next-hop selection. This architecture enables link-capacity sharing across commodities, mitigates starvation of short-lived traffic, and extends the benefits of SP-BP to network architectures based on per-neighbor FIFO forwarding. Through periodic virtual updates, Ant-BP also adapts to transient link failures and mobility-induced topology changes. Our theoretical analysis and simulations show that, compared with conventional ant colony optimization (ACO) routing, virtual SP-BP enables Ant-BP to establish higher-quality forwarding policies with lower overhead. As a result, Ant-BP improves latency and delivery ratio over SP-BP and ACO-based baselines under mixed streaming and bursty traffic, achieves throughput comparable to SP-BP at low and medium traffic load, and remains robust to mismatched virtual-traffic assumptions, transient link failures, and node mobility.

31.6LGMay 3

Robust and Explainable Divide-and-Conquer Learning for Intrusion Detection

Yan Zhou, Kevin Hamlen, Michael De Lucia et al.

Machine learning-based intrusion detection requires complex models to capture patterns in high-dimensional, noisy, and class-imbalanced raw network traffic, yet deploying such models remains impractical on resource-constrained devices with limited processing power and memory. In this paper, we present a correlation-aware divide-and-conquer learning technique that decomposes a complex learning problem into smaller, more manageable subproblems. This enables lightweight models as simple as decision trees to be trained on focused subtasks, yielding up to 43.3% higher local accuracy and up to 257 times reduction in model size on real-world network intrusion detection datasets, while also improving adversarial robustness and explainability.

LGFeb 13, 2024

FLASH: Federated Learning Across Simultaneous Heterogeneities

Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy et al.

The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of heterogeneity is critical; for instance, low-latency clients may have poor data quality, and vice versa. In this work, we propose FLASH(Federated Learning Across Simultaneous Heterogeneities), a lightweight and flexible client selection algorithm that outperforms state-of-the-art FL frameworks under extensive sources of heterogeneity, by trading-off the statistical information associated with the client's data quality, data distribution, and latency. FLASH is the first method, to our knowledge, for handling all these heterogeneities in a unified manner. To do so, FLASH models the learning dynamics through contextual multi-armed bandits (CMAB) and dynamically selects the most promising clients. Through extensive experiments, we demonstrate that FLASH achieves substantial and consistent improvements over state-of-the-art baselines -- as much as 10% in absolute accuracy -- thanks to its unified approach. Importantly, FLASH also outperforms federated aggregation methods that are designed to handle highly heterogeneous settings and even enjoys a performance boost when integrated with them.

LGDec 8, 2024

Fully Distributed Online Training of Graph Neural Networks in Networked Systems

Rostyslav Olshevskyi, Zhongyuan Zhao, Kevin Chan et al.

Graph neural networks (GNNs) are powerful tools for developing scalable, decentralized artificial intelligence in large-scale networked systems, such as wireless networks, power grids, and transportation networks. Currently, GNNs in networked systems mostly follow a paradigm of `centralized training, distributed execution', which limits their adaptability and slows down their development cycles. In this work, we fill this gap for the first time by developing a communication-efficient, fully distributed online training approach for GNNs applied to large networked systems. For a mini-batch with $B$ samples, our approach of training an $L$-layer GNN only adds $L$ rounds of message passing to the $LB$ rounds required by GNN inference, with doubled message sizes. Through numerical experiments in graph-based node regression, power allocation, and link scheduling in wireless networks, we demonstrate the effectiveness of our approach in training GNNs under supervised, unsupervised, and reinforcement learning paradigms.

LGDec 14, 2025

PRIVEE: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks

Sindhuja Madabushi, Ahmad Faraz Khan, Haider Ali et al.

Vertical Federated Learning (VFL) enables collaborative model training across organizations that share common user samples but hold disjoint feature spaces. Despite its potential, VFL is susceptible to feature inference attacks, in which adversarial parties exploit shared confidence scores (i.e., prediction probabilities) during inference to reconstruct private input features of other participants. To counter this threat, we propose PRIVEE (PRIvacy-preserving Vertical fEderated lEarning), a novel defense mechanism named after the French word privée, meaning "private." PRIVEE obfuscates confidence scores while preserving critical properties such as relative ranking and inter-score distances. Rather than exposing raw scores, PRIVEE shares only the transformed representations, mitigating the risk of reconstruction attacks without degrading model prediction accuracy. Extensive experiments show that PRIVEE achieves a threefold improvement in privacy protection compared to state-of-the-art defenses, while preserving full predictive performance against advanced feature inference attacks.

LGNov 24, 2025

Mitigating Participation Imbalance Bias in Asynchronous Federated Learning

Xiangyu Chang, Manyi Yao, Srikanth V. Krishnamurthy et al.

In Asynchronous Federated Learning (AFL), the central server immediately updates the global model with each arriving client's contribution. As a result, clients perform their local training on different model versions, causing information staleness (delay). In federated environments with non-IID local data distributions, this asynchronous pattern amplifies the adverse effect of client heterogeneity (due to different data distribution, local objectives, etc.), as faster clients contribute more frequent updates, biasing the global model. We term this phenomenon heterogeneity amplification. Our work provides a theoretical analysis that maps AFL design choices to their resulting error sources when heterogeneity amplification occurs. Guided by our analysis, we propose ACE (All-Client Engagement AFL), which mitigates participation imbalance through immediate, non-buffered updates that use the latest information available from all clients. We also introduce a delay-aware variant, ACED, to balance client diversity against update staleness. Experiments on different models for different tasks across diverse heterogeneity and delay settings validate our analysis and demonstrate the robust performance of our approaches.

NISep 5, 2025

Distributed Link Sparsification for Scalable Scheduling Using Graph Neural Networks (Journal Version)

Zhongyuan Zhao, Gunjan Verma, Ananthram Swami et al.

In wireless networks characterized by dense connectivity, the significant signaling overhead generated by distributed link scheduling algorithms can exacerbate issues like congestion, energy consumption, and radio footprint expansion. To mitigate these challenges, we propose a distributed link sparsification scheme employing graph neural networks (GNNs) to reduce scheduling overhead for delay-tolerant traffic while maintaining network capacity. A GNN module is trained to adjust contention thresholds for individual links based on traffic statistics and network topology, enabling links to withdraw from scheduling contention when they are unlikely to succeed. Our approach is facilitated by a novel offline constrained {unsupervised} learning algorithm capable of balancing two competing objectives: minimizing scheduling overhead while ensuring that total utility meets the required level. In simulated wireless multi-hop networks with up to 500 links, our link sparsification technique effectively alleviates network congestion and reduces radio footprints across four distinct distributed link scheduling protocols.

SPJan 18, 2024

Learning Non-myopic Power Allocation in Constrained Scenarios

Arindam Chowdhury, Santiago Paternain, Gunjan Verma et al.

We propose a learning-based framework for efficient power allocation in ad hoc interference networks under episodic constraints. The problem of optimal power allocation -- for maximizing a given network utility metric -- under instantaneous constraints has recently gained significant popularity. Several learnable algorithms have been proposed to obtain fast, effective, and near-optimal performance. However, a more realistic scenario arises when the utility metric has to be optimized for an entire episode under time-coupled constraints. In this case, the instantaneous power needs to be regulated so that the given utility can be optimized over an entire sequence of wireless network realizations while satisfying the constraint at all times. Solving each instance independently will be myopic as the long-term constraint cannot modulate such a solution. Instead, we frame this as a constrained and sequential decision-making problem, and employ an actor-critic algorithm to obtain the constraint-aware power allocation at each step. We present experimental analyses to illustrate the effectiveness of our method in terms of superior episodic network-utility performance and its efficiency in terms of time and computational complexity.

LGJan 6, 2024

Plug-and-Play Transformer Modules for Test-Time Adaptation

Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy et al.

Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate customized tuned modules for each such domain. Toward addressing these challenges, this work introduces PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains, effectively creating a ``module store''. Given a target domain with few-shot unlabeled data, we introduce an unsupervised test-time adaptation (TTA) method to (1) select a sparse subset of relevant modules from this store and (2) create a weighted combination of selected modules without tuning their weights. This plug-and-play nature enables us to harness multiple most-relevant source domains in a single inference call. Comprehensive evaluations demonstrate that PLUTO uniformly outperforms alternative TTA methods and that selecting $\leq$5 modules suffice to extract most of the benefit. At a high level, our method equips pre-trained transformers with the capability to dynamically adapt to new domains, motivating a new paradigm for efficient and scalable domain adaptation.

LGNov 15, 2021

Power Allocation for Wireless Federated Learning using Graph Neural Networks

Boning Li, Ananthram Swami, Santiago Segarra

We propose a data-driven approach for power allocation in the context of federated learning (FL) over interference-limited wireless networks. The power policy is designed to maximize the transmitted information during the FL process under communication constraints, with the ultimate objective of improving the accuracy and efficiency of the global FL model being trained. The proposed power allocation policy is parameterized using a graph convolutional network and the associated constrained optimization problem is solved through a primal-dual algorithm. Numerical experiments show that the proposed method outperforms three baseline methods in both transmission success rate and FL global performance.

SPSep 12, 2021

Link Scheduling using Graph Neural Networks

Zhongyuan Zhao, Gunjan Verma, Chirag Rao et al.

Efficient scheduling of transmissions is a key problem in wireless networks. The main challenge stems from the fact that optimal link scheduling involves solving a maximum weighted independent set (MWIS) problem, which is known to be NP-hard. In practical schedulers, centralized and distributed greedy heuristics are commonly used to approximately solve the MWIS problem. However, most of these greedy heuristics ignore important topological information of the wireless network. To overcome this limitation, we propose fast heuristics based on graph convolutional networks (GCNs) that can be implemented in centralized and distributed manners. Our centralized heuristic is based on tree search guided by a GCN and 1-step rollout. In our distributed MWIS solver, a GCN generates topology-aware node embeddings that are combined with per-link utilities before invoking a distributed greedy solver. Moreover, a novel reinforcement learning scheme is developed to train the GCN in a non-differentiable pipeline. Test results on medium-sized wireless networks show that our centralized heuristic can reach a near-optimal solution quickly, and our distributed heuristic based on a shallow GCN can reduce by nearly half the suboptimality gap of the distributed greedy solver with minimal increase in complexity. The proposed schedulers also exhibit good generalizability across graph and weight distributions.

LGMay 19, 2021

Free Energy Node Embedding via Generalized Skip-gram with Negative Sampling

Yu Zhu, Ananthram Swami, Santiago Segarra

A widely established set of unsupervised node embedding methods can be interpreted as consisting of two distinctive steps: i) the definition of a similarity matrix based on the graph of interest followed by ii) an explicit or implicit factorization of such matrix. Inspired by this viewpoint, we propose improvements in both steps of the framework. On the one hand, we propose to encode node similarities based on the free energy distance, which interpolates between the shortest path and the commute time distances, thus, providing an additional degree of flexibility. On the other hand, we propose a matrix factorization method based on a loss function that generalizes that of the skip-gram model with negative sampling to arbitrary similarity matrices. Compared with factorizations based on the widely used $\ell_2$ loss, the proposed method can better preserve node pairs associated with higher similarity scores. Moreover, it can be easily implemented using advanced automatic differentiation toolkits and computed efficiently by leveraging GPU resources. Node clustering, node classification, and link prediction experiments on real-world datasets demonstrate the effectiveness of incorporating free-energy-based similarities as well as the proposed matrix factorization compared with state-of-the-art alternatives.

SPNov 18, 2020

Distributed Scheduling using Graph Neural Networks

Zhongyuan Zhao, Gunjan Verma, Chirag Rao et al.

A fundamental problem in the design of wireless networks is to efficiently schedule transmission in a distributed manner. The main challenge stems from the fact that optimal link scheduling involves solving a maximum weighted independent set (MWIS) problem, which is NP-hard. For practical link scheduling schemes, distributed greedy approaches are commonly used to approximate the solution of the MWIS problem. However, these greedy schemes mostly ignore important topological information of the wireless networks. To overcome this limitation, we propose a distributed MWIS solver based on graph convolutional networks (GCNs). In a nutshell, a trainable GCN module learns topology-aware node embeddings that are combined with the network weights before calling a greedy solver. In small- to middle-sized wireless networks with tens of links, even a shallow GCN-based MWIS scheduler can leverage the topological information of the graph to reduce in half the suboptimality gap of the distributed greedy solver with good generalizability across graphs and minimal increase in complexity.

SPNov 18, 2020

Adaptive Contention Window Design using Deep Q-learning

Abhishek Kumar, Gunjan Verma, Chirag Rao et al.

We study the problem of adaptive contention window (CW) design for random-access wireless networks. More precisely, our goal is to design an intelligent node that can dynamically adapt its minimum CW (MCW) parameter to maximize a network-level utility knowing neither the MCWs of other nodes nor how these change over time. To achieve this goal, we adopt a reinforcement learning (RL) framework where we circumvent the lack of system knowledge with local channel observations and we reward actions that lead to high utilities. To efficiently learn these preferred actions, we follow a deep Q-learning approach, where the Q-value function is parametrized using a multi-layer perception. In particular, we implement a rainbow agent, which incorporates several empirical improvements over the basic deep Q-network. Numerical experiments based on the NS3 simulator reveal that the proposed RL agent performs close to optimal and markedly improves upon existing learning and non-learning based alternatives.

SPNov 18, 2020

Efficient power allocation using graph neural networks and deep algorithm unfolding

Arindam Chowdhury, Gunjan Verma, Chirag Rao et al.

We study the problem of optimal power allocation in a single-hop ad hoc wireless network. In solving this problem, we propose a hybrid neural architecture inspired by the algorithmic unfolding of the iterative weighted minimum mean squared error (WMMSE) method, that we denote as unfolded WMMSE (UWMMSE). The learnable weights within UWMMSE are parameterized using graph neural networks (GNNs), where the time-varying underlying graphs are given by the fading interference coefficients in the wireless network. These GNNs are trained through a gradient descent approach based on multiple instances of the power allocation problem. Once trained, UWMMSE achieves performance comparable to that of WMMSE while significantly reducing the computational complexity. This phenomenon is illustrated through numerical experiments along with the robustness and generalization to wireless networks of different densities and sizes.

CRNov 3, 2020

You Do (Not) Belong Here: Detecting DPI Evasion Attacks with Context Learning

Shitong Zhu, Shasha Li, Zhongjie Wang et al.

As Deep Packet Inspection (DPI) middleboxes become increasingly popular, a spectrum of adversarial attacks have emerged with the goal of evading such middleboxes. Many of these attacks exploit discrepancies between the middlebox network protocol implementations, and the more rigorous/complete versions implemented at end hosts. These evasion attacks largely involve subtle manipulations of packets to cause different behaviours at DPI and end hosts, to cloak malicious network traffic that is otherwise detectable. With recent automated discovery, it has become prohibitively challenging to manually curate rules for detecting these manipulations. In this work, we propose CLAP, the first fully-automated, unsupervised ML solution to accurately detect and localize DPI evasion attacks. By learning what we call the packet context, which essentially captures inter-relationships across both (1) different packets in a connection; and (2) different header fields within each packet, from benign traffic traces only, CLAP can detect and pinpoint packets that violate the benign packet contexts (which are the ones that are specially crafted for evasion purposes). Our evaluations with 73 state-of-the-art DPI evasion attacks show that CLAP achieves an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.963, an Equal Error Rate (EER) of only 0.061 in detection, and an accuracy of 94.6% in localization. These results suggest that CLAP can be a promising tool for thwarting DPI evasion attacks.

LGOct 8, 2020

Unsupervised Joint $k$-node Graph Representations with Compositional Energy-Based Models

Leonardo Cotta, Carlos H. C. Teixeira, Ananthram Swami et al.

Existing Graph Neural Network (GNN) methods that learn inductive unsupervised graph representations focus on learning node and edge representations by predicting observed edges in the graph. Although such approaches have shown advances in downstream node classification tasks, they are ineffective in jointly representing larger $k$-node sets, $k{>}2$. We propose MHM-GNN, an inductive unsupervised graph representation approach that combines joint $k$-node representations with energy-based models (hypergraph Markov networks) and GNNs. To address the intractability of the loss that arises from this combination, we endow our optimization with a loss upper bound using a finite-sample unbiased Markov Chain Monte Carlo estimator. Our experiments show that the unsupervised MHM-GNN representations of MHM-GNN produce better unsupervised representations than existing approaches from the literature.

LGSep 17, 2020

An Extension of Fano's Inequality for Characterizing Model Susceptibility to Membership Inference Attacks

Sumit Kumar Jha, Susmit Jha, Rickard Ewetz et al.

Deep neural networks have been shown to be vulnerable to membership inference attacks wherein the attacker aims to detect whether specific input data were used to train the model. These attacks can potentially leak private or proprietary data. We present a new extension of Fano's inequality and employ it to theoretically establish that the probability of success for a membership inference attack on a deep neural network can be bounded using the mutual information between its inputs and its activations. This enables the use of mutual information to measure the susceptibility of a DNN model to membership inference attacks. In our empirical evaluation, we show that the correlation between the mutual information and the susceptibility of the DNN model to membership inference attacks is 0.966, 0.996, and 0.955 for CIFAR-10, SVHN and GTSRB models, respectively.

CVAug 26, 2020

Measurement-driven Security Analysis of Imperceptible Impersonation Attacks

Shasha Li, Karim Khalil, Rameswar Panda et al.

The emergence of Internet of Things (IoT) brings about new security challenges at the intersection of cyber and physical spaces. One prime example is the vulnerability of Face Recognition (FR) based access control in IoT systems. While previous research has shown that Deep Neural Network(DNN)-based FR systems (FRS) are potentially susceptible to imperceptible impersonation attacks, the potency of such attacks in a wide set of scenarios has not been thoroughly investigated. In this paper, we present the first systematic, wide-ranging measurement study of the exploitability of DNN-based FR systems using a large scale dataset. We find that arbitrary impersonation attacks, wherein an arbitrary attacker impersonates an arbitrary target, are hard if imperceptibility is an auxiliary goal. Specifically, we show that factors such as skin color, gender, and age, impact the ability to carry out an attack on a specific target victim, to different extents. We also study the feasibility of constructing universal attacks that are robust to different poses or views of the attacker's face. Our results show that finding a universal perturbation is a much harder problem from the attacker's perspective. Finally, we find that the perturbed images do not generalize well across different DNN models. This suggests security countermeasures that can dramatically reduce the exploitability of DNN-based FR systems.

CVJul 19, 2020

Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency

Shasha Li, Shitong Zhu, Sudipta Paul et al.

There has been a recent surge in research on adversarial perturbations that defeat Deep Neural Networks (DNNs) in machine vision; most of these perturbation-based attacks target object classifiers. Inspired by the observation that humans are able to recognize objects that appear out of place in a scene or along with other unlikely objects, we augment the DNN with a system that learns context consistency rules during training and checks for the violations of the same during testing. Our approach builds a set of auto-encoders, one for each object class, appropriately trained so as to output a discrepancy between the input and output if an added adversarial perturbation violates context consistency rules. Experiments on PASCAL VOC and MS COCO show that our method effectively detects various adversarial attacks and achieves high ROC-AUC (over 0.95 in most cases); this corresponds to over 20% improvement over a state-of-the-art context-agnostic method.

LGJul 15, 2020

GraphCL: Contrastive Self-Supervised Learning of Graph Representations

Hakim Hafidi, Mounir Ghogho, Philippe Ciblat et al.

We propose Graph Contrastive Learning (GraphCL), a general framework for learning node representations in a self supervised manner. GraphCL learns node embeddings by maximizing the similarity between the representations of two randomly perturbed versions of the intrinsic features and link structure of the same node's local subgraph. We use graph neural networks to produce two representations of the same node and leverage a contrastive learning loss to maximize agreement between them. In both transductive and inductive learning setups, we demonstrate that our approach significantly outperforms the state-of-the-art in unsupervised learning on a number of node classification benchmarks.

NEMay 6, 2020

A Multifactorial Optimization Paradigm for Linkage Tree Genetic Algorithm

Huynh Thi Thanh Binh, Pham Dinh Thanh, Tran Ba Trung et al.

Linkage Tree Genetic Algorithm (LTGA) is an effective Evolutionary Algorithm (EA) to solve complex problems using the linkage information between problem variables. LTGA performs well in various kinds of single-task optimization and yields promising results in comparison with the canonical genetic algorithm. However, LTGA is an unsuitable method for dealing with multi-task optimization problems. On the other hand, Multifactorial Optimization (MFO) can simultaneously solve independent optimization problems, which are encoded in a unified representation to take advantage of the process of knowledge transfer. In this paper, we introduce Multifactorial Linkage Tree Genetic Algorithm (MF-LTGA) by combining the main features of both LTGA and MFO. MF-LTGA is able to tackle multiple optimization tasks at the same time, each task learns the dependency between problem variables from the shared representation. This knowledge serves to determine the high-quality partial solutions for supporting other tasks in exploring the search space. Moreover, MF-LTGA speeds up convergence because of knowledge transfer of relevant problems. We demonstrate the effectiveness of the proposed algorithm on two benchmark problems: Clustered Shortest-Path Tree Problem and Deceptive Trap Function. In comparison to LTGA and existing methods, MF-LTGA outperforms in quality of the solution or in computation time.

LGNov 7, 2019

SENSE: Semantically Enhanced Node Sequence Embedding

Swati Rallapalli, Liang Ma, Mudhakar Srivatsa et al.

Effectively capturing graph node sequences in the form of vector embeddings is critical to many applications. We achieve this by (i) first learning vector embeddings of single graph nodes and (ii) then composing them to compactly represent node sequences. Specifically, we propose SENSE-S (Semantically Enhanced Node Sequence Embedding - for Single nodes), a skip-gram based novel embedding mechanism, for single graph nodes that co-learns graph structure as well as their textual descriptions. We demonstrate that SENSE-S vectors increase the accuracy of multi-label classification tasks by up to 50% and link-prediction tasks by up to 78% under a variety of scenarios using real datasets. Based on SENSE-S, we next propose generic SENSE to compute composite vectors that represent a sequence of nodes, where preserving the node order is important. We prove that this approach is efficient in embedding node sequences, and our experiments on real data confirm its high accuracy in node order decoding.

NISep 19, 2019

MACS: Deep Reinforcement Learning based SDN Controller Synchronization Policy Design

Ziyao Zhang, Liang Ma, Konstantinos Poularakis et al.

In distributed software-defined networks (SDN), multiple physical SDN controllers, each managing a network domain, are implemented to balance centralised control, scalability, and reliability requirements. In such networking paradigms, controllers synchronize with each other, in attempts to maintain a logically centralised network view. Despite the presence of various design proposals for distributed SDN controller architectures, most existing works only aim at eliminating anomalies arising from the inconsistencies in different controllers' network views. However, the performance aspect of controller synchronization designs with respect to given SDN applications are generally missing. To fill this gap, we formulate the controller synchronization problem as a Markov decision process (MDP) and apply reinforcement learning techniques combined with deep neural networks (DNNs) to train a smart, scalable, and fine-grained controller synchronization policy, called the Multi-Armed Cooperative Synchronization (MACS), whose goal is to maximise the performance enhancements brought by controller synchronizations. Evaluation results confirm the DNN's exceptional ability in abstracting latent patterns in the distributed SDN environment, rendering significant superiority to MACS-based synchronization policy, which are 56% and 30% performance improvements over ONOS and greedy SDN controller synchronization heuristics.

LGMar 14, 2019

Attribution-driven Causal Analysis for Detection of Adversarial Examples

Susmit Jha, Sunny Raj, Steven Lawrence Fernandes et al.

Attribution methods have been developed to explain the decision of a machine learning model on a given input. We use the Integrated Gradient method for finding attributions to define the causal neighborhood of an input by incrementally masking high attribution features. We study the robustness of machine learning models on benign and adversarial inputs in this neighborhood. Our study indicates that benign inputs are robust to the masking of high attribution features but adversarial inputs generated by the state-of-the-art adversarial attack methods such as DeepFool, FGSM, CW and PGD, are not robust to such masking. Further, our study demonstrates that this concentration of high-attribution features responsible for the incorrect decision is more pronounced in physically realizable adversarial examples. This difference in attribution of benign and adversarial inputs can be used to detect adversarial examples. Such a defense approach is independent of training data and attack method, and we demonstrate its effectiveness on digital and physically realizable perturbations.

LGJul 2, 2018

Adversarial Perturbations Against Real-Time Video Classification Systems

Shasha Li, Ajaya Neupane, Sujoy Paul et al.

Recent research has demonstrated the brittleness of machine learning systems to adversarial perturbations. However, the studies have been mostly limited to perturbations on images and more generally, classification that does not deal with temporally varying inputs. In this paper we ask "Are adversarial perturbations possible in real-time video classification systems and if so, what properties must they satisfy?" Such systems find application in surveillance applications, smart vehicles, and smart elderly care and thus, misclassification could be particularly harmful (e.g., a mishap at an elderly care facility may be missed). We show that accounting for temporal structure is key to generating adversarial examples in such systems. We exploit recent advances in generative adversarial network (GAN) architectures to account for temporal correlations and generate adversarial samples that can cause misclassification rates of over 80% for targeted activities. More importantly, the samples also leave other activities largely unaffected making them extremely stealthy. Finally, we also surprisingly find that in many scenarios, the same perturbation can be applied to every frame in a video clip that makes the adversary's ability to achieve misclassification relatively easy.

LGFeb 12, 2018

Multi-Armed Bandits on Partially Revealed Unit Interval Graphs

Xiao Xu, Sattar Vakili, Qing Zhao et al.

A stochastic multi-armed bandit problem with side information on the similarity and dissimilarity across different arms is considered. The action space of the problem can be represented by a unit interval graph (UIG) where each node represents an arm and the presence (absence) of an edge between two nodes indicates similarity (dissimilarity) between their mean rewards. Two settings of complete and partial side information based on whether the UIG is fully revealed are studied and a general two-step learning structure consisting of an offline reduction of the action space and online aggregation of reward observations from similar arms is proposed to fully exploit the topological structure of the side information. In both cases, the computation efficiency and the order optimality of the proposed learning policies in terms of both the size of the action space and the time length are established.

LGOct 26, 2017

Sparse Diffusion-Convolutional Neural Networks

James Atwood, Siddharth Pal, Don Towsley et al.

The predictive power and overall computational efficiency of Diffusion-convolutional neural networks make them an attractive choice for node classification tasks. However, a naive dense-tensor-based implementation of DCNNs leads to $\mathcal{O}(N^2)$ memory complexity which is prohibitive for large graphs. In this paper, we introduce a simple method for thresholding input graphs that provably reduces memory requirements of DCNNs to O(N) (i.e. linear in the number of nodes in the input) without significantly affecting predictive performance.

AISep 30, 2016

Outlier Detection from Network Data with Subnetwork Interpretation

Xuan-Hong Dang, Arlei Silva, Ambuj Singh et al.

Detecting a small number of outliers from a set of data observations is always challenging. This problem is more difficult in the setting of multiple network samples, where computing the anomalous degree of a network sample is generally not sufficient. In fact, explaining why the network is exceptional, expressed in the form of subnetwork, is also equally important. In this paper, we develop a novel algorithm to address these two key problems. We treat each network sample as a potential outlier and identify subnetworks that mostly discriminate it from nearby regular samples. The algorithm is developed in the framework of network regression combined with the constraints on both network topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus goes beyond subspace/subgraph discovery and we show that it converges to a global optimum. Evaluation on various real-world network datasets demonstrates that our algorithm not only outperforms baselines in both network and high dimensional setting, but also discovers highly relevant and interpretable local subnetworks, further enhancing our understanding of anomalous networks.

MLJun 24, 2016

Modeling Group Dynamics Using Probabilistic Tensor Decompositions

Lin Li, Ananthram Swami, Anna Scaglione

We propose a probabilistic modeling framework for learning the dynamic patterns in the collective behaviors of social agents and developing profiles for different behavioral groups, using data collected from multiple information sources. The proposed model is based on a hierarchical Bayesian process, in which each observation is a finite mixture of an set of latent groups and the mixture proportions (i.e., group probabilities) are drawn randomly. Each group is associated with some distributions over a finite set of outcomes. Moreover, as time evolves, the structure of these groups also changes; we model the change in the group structure by a hidden Markov model (HMM) with a fixed transition probability. We present an efficient inference method based on tensor decompositions and the expectation-maximization (EM) algorithm for parameter estimation.

CRApr 28, 2016

Crafting Adversarial Input Sequences for Recurrent Neural Networks

Nicolas Papernot, Patrick McDaniel, Ananthram Swami et al.

Machine learning models are frequently used to solve complex security problems, as well as to make decisions in sensitive situations like guiding autonomous vehicles or predicting financial market behaviors. Previous efforts have shown that numerous machine learning models were vulnerable to adversarial manipulations of their inputs taking the form of adversarial samples. Such inputs are crafted by adding carefully selected perturbations to legitimate inputs so as to force the machine learning model to misbehave, for instance by outputting a wrong class if the machine learning task of interest is classification. In fact, to the best of our knowledge, all previous work on adversarial samples crafting for neural network considered models used to solve classification tasks, most frequently in computer vision applications. In this paper, we contribute to the field of adversarial machine learning by investigating adversarial input sequences for recurrent neural networks processing sequential data. We show that the classes of algorithms introduced previously to craft adversarial samples misclassified by feed-forward neural networks can be adapted to recurrent neural networks. In a experiment, we show that adversaries can craft adversarial sequences misleading both categorical and sequential recurrent neural networks.

CRMar 31, 2016

Detection under Privileged Information

Z. Berkay Celik, Patrick McDaniel, Rauf Izmailov et al.

For well over a quarter century, detection systems have been driven by models learned from input features collected from real or simulated environments. An artifact (e.g., network event, potential malware sample, suspicious email) is deemed malicious or non-malicious based on its similarity to the learned model at runtime. However, the training of the models has been historically limited to only those features available at runtime. In this paper, we consider an alternate learning approach that trains models using "privileged" information--features available at training time but not at runtime--to improve the accuracy and resilience of detection systems. In particular, we adapt and extend recent advances in knowledge transfer, model influence, and distillation to enable the use of forensic or other data unavailable at runtime in a range of security domains. An empirical evaluation shows that privileged information increases precision and recall over a system with no privileged information: we observe up to 7.7% relative decrease in detection error for fast-flux bot detection, 8.6% for malware traffic detection, 7.3% for malware classification, and 16.9% for face recognition. We explore the limitations and applications of different privileged information techniques in detection systems. Such techniques provide a new means for detection systems to learn from data that would otherwise not be available at runtime.

CRFeb 8, 2016

Practical Black-Box Attacks against Machine Learning

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow et al.

Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.

CRNov 24, 2015

The Limitations of Deep Learning in Adversarial Settings

Nicolas Papernot, Patrick McDaniel, Somesh Jha et al.

Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.

CRNov 14, 2015

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Nicolas Papernot, Patrick McDaniel, Xi Wu et al.

Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive distillation when training DNNs. We also empirically study the effectiveness of our defense mechanisms on two DNNs placed in adversarial settings. The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be explained by the fact that distillation leads gradients used in adversarial sample creation to be reduced by a factor of 10^30. We also find that distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one of the DNNs we tested.

CRNov 2, 2015

Six Potential Game-Changers in Cyber Security: Towards Priorities in Cyber Science and Engineering

Alexander Kott, Ananthram Swami, Patrick McDaniel

The fields of study encompassed by cyber science and engineering are broad and poorly defined at this time. As national governments and research communities increase their recognition of the importance, urgency and technical richness of these disciplines, a question of priorities arises: what specific sub-areas of research should be the foci of attention and funding? In this paper we point to an approach to answering this question. We explore results of a recent workshop that postulated possible game-changers or disruptive changes that might occur in cyber security within the next 15 years. We suggest that such game-changers may be useful in focusing attention of research communities on high-priority topics. Indeed, if a drastic, important change is likely to occur, should we not focus our research efforts on the nature and ramifications of the phenomena pertaining to that change? We illustrate each of the game-changers examples of related current research, and then offer recommendations for advancement of cyber science and engineering with respect to each of the six game-changers.