Narayan B. Mandayam

LG
h-index55
15papers
299citations
Novelty48%
AI Score37

15 Papers

SYNov 5, 2019
Colonel Blotto Game for Secure State Estimation in Interdependent Critical Infrastructure

Aidin Ferdowsi, Walid Saad, Narayan B. Mandayam

Securing the physical components of a city's interdependent critical infrastructure (ICI) such as power, natural gas, and water systems is a challenging task due to their interdependence and a large number of involved sensors. In this paper, using a novel integrated state-space model that captures the interdependence, a two-stage cyber attack on an ICI is studied in which the attacker first compromises the ICI's sensors by decoding their messages, and, subsequently, it alters the compromised sensors' data to cause state estimation errors. To thwart such attacks, the administrator of each critical infrastructure (CI) must assign protection levels to the sensors based on their importance in the state estimation process. To capture the interdependence between the attacker and the ICI administrator's actions and analyze their interactions, a Colonel Blotto game framework is proposed. The mixed-strategy Nash equilibrium of this game is derived analytically. At this equilibrium, it is shown that the administrator can strategically randomize between the protection levels of the sensors to deceive the attacker. Simulation results coupled with theoretical analysis show that, using the proposed game, the administrator can reduce the state estimation error by at least $ 50\% $ compared to a non-strategic approach that assigns protection levels proportional to sensor values.

NIApr 20, 2018
Design of Ad Hoc Wireless Mesh Networks Formed by Unmanned Aerial Vehicles with Advanced Mechanical Automation

Ryoichi Shinkuma, Narayan B. Mandayam

Ad hoc wireless mesh networks formed by unmanned aerial vehicles (UAVs) equipped with wireless transceivers (access points (APs)) are increasingly being touted as being able to provide a flexible "on-the-fly" communications infrastructure that can collect and transmit sensor data from sensors in remote, wilderness, or disaster-hit areas. Recent advances in the mechanical automation of UAVs have resulted in separable APs and replaceable batteries that can be carried by UAVs and placed at arbitrary locations in the field. These advanced mechanized UAV mesh networks pose interesting questions in terms of the design of the network architecture and the optimal UAV scheduling algorithms. This paper studies a range of network architectures that depend on the mechanized automation (AP separation and battery replacement) capabilities of UAVs and proposes heuristic UAV scheduling algorithms for each network architecture, which are benchmarked against optimal designs.

LGMar 6, 2022
Watch from sky: machine-learning-based multi-UAV network for predictive police surveillance

Ryusei Sugano, Ryoichi Shinkuma, Takayuki Nishio et al.

This paper presents the watch-from-sky framework, where multiple unmanned aerial vehicles (UAVs) play four roles, i.e., sensing, data forwarding, computing, and patrolling, for predictive police surveillance. Our framework is promising for crime deterrence because UAVs are useful for collecting and distributing data and have high mobility. Our framework relies on machine learning (ML) technology for controlling and dispatching UAVs and predicting crimes. This paper compares the conceptual model of our framework against the literature. It also reports a simulation of UAV dispatching using reinforcement learning and distributed ML inference over a lossy UAV network.

CRFeb 22, 2025
Human-AI Collaboration in Cloud Security: Cognitive Hierarchy-Driven Deep Reinforcement Learning

Zahra Aref, Sheng Wei, Narayan B. Mandayam

Given the complexity of multi-tenant cloud environments and the growing need for real-time threat mitigation, Security Operations Centers (SOCs) must adopt AI-driven adaptive defense mechanisms to counter Advanced Persistent Threats (APTs). However, SOC analysts face challenges in handling adaptive adversarial tactics, requiring intelligent decision-support frameworks. We propose a Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework that models interactive decision-making between SOC analysts and AI-driven APT bots. The SOC analyst (defender) operates at cognitive level-1, anticipating attacker strategies, while the APT bot (attacker) follows a level-0 policy. By incorporating CHT into DQN, our framework enhances adaptive SOC defense using Attack Graph (AG)-based reinforcement learning. Simulation experiments across varying AG complexities show that CHT-DQN consistently achieves higher data protection and lower action discrepancies compared to standard DQN. A theoretical lower bound further confirms its superiority as AG complexity increases. A human-in-the-loop (HITL) evaluation on Amazon Mechanical Turk (MTurk) reveals that SOC analysts using CHT-DQN-derived transition probabilities align more closely with adaptive attackers, leading to better defense outcomes. Moreover, human behavior aligns with Prospect Theory (PT) and Cumulative Prospect Theory (CPT): participants are less likely to reselect failed actions and more likely to persist with successful ones. This asymmetry reflects amplified loss sensitivity and biased probability weighting -- underestimating gains after failure and overestimating continued success. Our findings highlight the potential of integrating cognitive models into deep reinforcement learning to improve real-time SOC decision-making for cloud security.

LGSep 19, 2025
Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers

Zahra Aref, Narayan B. Mandayam

Transformers have emerged as a compelling architecture for sequential decision-making by modeling trajectories via self-attention. In reinforcement learning (RL), they enable return-conditioned control without relying on value function approximation. Decision Transformers (DTs) exploit this by casting RL as supervised sequence modeling, but they are restricted to offline data and lack exploration. Online Decision Transformers (ODTs) address this limitation through entropy-regularized training on on-policy rollouts, offering a stable alternative to traditional RL methods like Soft Actor-Critic, which depend on bootstrapped targets and reward shaping. Despite these advantages, ODTs use standard attention, which lacks explicit memory of action-specific outcomes. This leads to inefficiencies in learning long-term action effectiveness. Inspired by cognitive models such as Experience-Weighted Attraction (EWA), we propose Experience-Weighted Attraction with Vector Quantization for Online Decision Transformers (EWA-VQ-ODT), a lightweight module that maintains per-action mental accounts summarizing recent successes and failures. Continuous actions are routed via direct grid lookup to a compact vector-quantized codebook, where each code stores a scalar attraction updated online through decay and reward-based reinforcement. These attractions modulate attention by biasing the columns associated with action tokens, requiring no change to the backbone or training objective. On standard continuous-control benchmarks, EWA-VQ-ODT improves sample efficiency and average return over ODT, particularly in early training. The module is computationally efficient, interpretable via per-code traces, and supported by theoretical guarantees that bound the attraction dynamics and its impact on attention drift.

SPNov 29, 2021
Network Traffic Shaping for Enhancing Privacy in IoT Systems

Sijie Xiong, Anand D. Sarwate, Narayan B. Mandayam

Motivated by privacy issues caused by inference attacks on user activities in the packet sizes and timing information of Internet of Things (IoT) network traffic, we establish a rigorous event-level differential privacy (DP) model on infinite packet streams. We propose a memoryless traffic shaping mechanism satisfying a first-come-first-served queuing discipline that outputs traffic dependent on the input using a DP mechanism. We show that in special cases the proposed mechanism recovers existing shapers which standardize the output independently from the input. To find the optimal shapers for given levels of privacy and transmission efficiency, we formulate the constrained problem of minimizing the expected delay per packet and propose using the expected queue size across time as a proxy. We further show that the constrained minimization is a convex program. We demonstrate the effect of shapers on both synthetic data and packet traces from actual IoT devices. The experimental results reveal inherent privacy-overhead tradeoffs: more shaping overhead provides better privacy protection. Under the same privacy level, there naturally exists a tradeoff between dummy traffic and delay. When dealing with heavier or less bursty input traffic, all shapers become more overhead-efficient. We also show that increased traffic from a larger number of IoT devices makes guaranteeing event-level privacy easier. The DP shaper offers tunable privacy that is invariant with the change in the input traffic distribution and has an advantage in handling burstiness over traffic-independent shapers. This approach well accommodates heterogeneous network conditions and enables users to adapt to their privacy/overhead demands.

LGJun 25, 2021
A hybrid model-based and learning-based approach for classification using limited number of training samples

Alireza Nooraiepour, Waheed U. Bajwa, Narayan B. Mandayam

The fundamental task of classification given a limited number of training data samples is considered for physical systems with known parametric statistical models. The standalone learning-based and statistical model-based classifiers face major challenges towards the fulfillment of the classification task using a small training set. Specifically, classifiers that solely rely on the physics-based statistical models usually suffer from their inability to properly tune the underlying unobservable parameters, which leads to a mismatched representation of the system's behaviors. Learning-based classifiers, on the other hand, typically rely on a large number of training data from the underlying physical process, which might not be feasible in most practical scenarios. In this paper, a hybrid classification method -- termed HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers. The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers by fusing their respective strengths. The proposed hybrid approach first estimates the unobservable model parameters using the available (suboptimal) statistical estimation procedures, and subsequently use the physics-based statistical models to generate synthetic data. Then, the training data samples are incorporated with the synthetic data in a learning-based classifier that is based on domain-adversarial training of neural networks. Specifically, in order to address the mismatch problem, the classifier learns a mapping from the training data and the synthetic data to a common feature space. Simultaneously, the classifier is trained to find discriminative features within this space in order to fulfill the classification task.

LGSep 20, 2020
Estimation of Individual Device Contributions for Incentivizing Federated Learning

Takayuki Nishio, Ryoichi Shinkuma, Narayan B. Mandayam

Federated learning (FL) is an emerging technique used to train a machine-learning model collaboratively using the data and computation resource of the mobile devices without exposing privacy-sensitive user data. Appropriate incentive mechanisms that motivate the data and mobile-device owner to participate in FL is key to building a sustainable platform for FL. However, it is difficult to evaluate the contribution level of the devices/owners to determine appropriate rewards without large computation and communication overhead. This paper proposes a computation-and communication-efficient method of estimating a participating device's contribution level. The proposed method enables such estimation during a single FL training process, there by reducing the need for traffic and computation overhead. The performance evaluations using the MNIST dataset show that the proposed method estimates individual participants' contributions accurately with 46-49% less computation overhead and no communication overhead than a naive estimation method.

LGAug 1, 2019
Learning-Aided Physical Layer Attacks Against Multicarrier Communications in IoT

Alireza Nooraiepour, Waheed U. Bajwa, Narayan B. Mandayam

Internet-of-Things (IoT) devices that are limited in power and processing are susceptible to physical layer (PHY) spoofing (signal exploitation) attacks owing to their inability to implement a full-blown protocol stack for security. The overwhelming adoption of multicarrier techniques such as orthogonal frequency division multiplexing (OFDM) for the PHY layer makes IoT devices further vulnerable to PHY spoofing attacks. These attacks which aim at injecting bogus/spurious data into the receiver, involve inferring transmission parameters and finding PHY characteristics of the transmitted signals so as to spoof the received signal. Non-contiguous (NC) OFDM systems have been argued to have low probability of exploitation (LPE) characteristics against classic attacks based on cyclostationary analysis, and the corresponding PHY has been deemed to be secure. However, with the advent of machine learning (ML) algorithms, adversaries can devise data-driven attacks to compromise such systems. It is in this vein that PHY spoofing performance of adversaries equipped with supervised and unsupervised ML tools are investigated in this paper. The supervised ML approach is based on deep neural networks (DNN) while the unsupervised one employs variational autoencoders (VAEs). In particular, VAEs are shown to be capable of learning representations from NC-OFDM signals related to their PHY characteristics such as frequency pattern and modulation scheme, which are useful for PHY spoofing. In addition, a new metric based on the disentanglement principle is proposed to measure the quality of such learned representations. Simulation results demonstrate that the performance of the spoofing adversaries highly depends on the subcarriers' allocation patterns. Particularly, it is shown that utilizing a random subcarrier occupancy pattern secures NC-OFDM systems against ML-based attacks.

NIMay 12, 2019
Learning-based Resource Optimization in Ultra Reliable Low Latency HetNets

Mohammad Yousefvand, Kenza Hamidouche, Narayan B. Mandayam

In this paper, the problems of user offloading and resource optimization are jointly addressed to support ultra-reliable and low latency communications (URLLC) in HetNets. In particular, a multi-tier network with a single macro base station (MBS) and multiple overlaid small cell base stations (SBSs) is considered that includes users with different latency and reliability constraints. Modeling the latency and reliability constraints of users with probabilistic guarantees, the joint problem of user offloading and resource allocation (JUR) in a URLLC setting is formulated as an optimization problem to minimize the cost of serving users for the MBS. In the considered scheme, SBSs bid to serve URLLC users under their coverage at a given price, and the MBS decides whether to serve each user locally or to offload it to one of the overlaid SBSs. Since the JUR optimization is NP-hard, we propose a low complexity learning-based heuristic method (LHM) which includes a support vector machine-based user association model and a convex resource optimization (CRO) algorithm. To further reduce the delay, we propose an alternating direction method of multipliers (ADMM)-based solution to the CRO problem. Simulation results show that using LHM, the MBS significantly decreases the spectrum access delay for users (by $\sim$ 93\%) as compared to JUR, while also reducing its bandwidth and power costs in serving users (by $\sim$ 33\%) as compared to directly serving users without offloading.

SYDec 13, 2018
Cyber-Physical Security and Safety of Autonomous Connected Vehicles: Optimal Control Meets Multi-Armed Bandit Learning

Aidin Ferdowsi, Samad Ali, Walid Saad et al.

Autonomous connected vehicles (ACVs) rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication to operate effectively. This reliance on cyber components exposes ACVs to cyber and physical attacks in which an adversary can manipulate sensor readings and physically take control of an ACV. In this paper, a comprehensive framework is proposed to thwart cyber and physical attacks on ACV networks. First, an optimal safe controller for ACVs is derived to maximize the street traffic flow while minimizing the risk of accidents by optimizing ACV speed and inter-ACV spacing. It is proven that the proposed controller is robust to physical attacks which aim at making ACV systems instable. To improve the cyber-physical security of ACV systems, next, data injection attack (DIA) detection approaches are proposed to address cyber attacks on sensors and their physical impact on the ACV system. To comprehensively design the DIA detection approaches, ACV sensors are characterized in two subsets based on the availability of a-priori information about their data. For sensors having a prior information, a DIA detection approach is proposed and an optimal threshold level is derived for the difference between the actual and estimated values of sensors data which enables ACV to stay robust against cyber attacks. For sensors having no prior information, a novel multi-armed bandit (MAB) algorithm is proposed to enable ACV to securely control its motion. Simulation results show that the proposed optimal safe controller outperforms current state of the art controllers by maximizing the robustness of ACVs to physical attacks. The results also show that the proposed DIA detection approaches, compared to Kalman filtering, can improve the security of ACV sensors against cyber attacks and ultimately improve the physical robustness of an ACV system.

SYMay 2, 2018
Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems

Aidin Ferdowsi, Ursula Challita, Walid Saad et al.

To operate effectively in tomorrow's smart cities, autonomous vehicles (AVs) must rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication. Such dependence on sensors and communication links exposes AVs to cyber-physical (CP) attacks by adversaries that seek to take control of the AVs by manipulating their data. Thus, to ensure safe and optimal AV dynamics control, the data processing functions at AVs must be robust to such CP attacks. To this end, in this paper, the state estimation process for monitoring AV dynamics, in presence of CP attacks, is analyzed and a novel adversarial deep reinforcement learning (RL) algorithm is proposed to maximize the robustness of AV dynamics control to CP attacks. The attacker's action and the AV's reaction to CP attacks are studied in a game-theoretic framework. In the formulated game, the attacker seeks to inject faulty data to AV sensor readings so as to manipulate the inter-vehicle optimal safe spacing and potentially increase the risk of AV accidents or reduce the vehicle flow on the roads. Meanwhile, the AV, acting as a defender, seeks to minimize the deviations of spacing so as to ensure robustness to the attacker's actions. Since the AV has no information about the attacker's action and due to the infinite possibilities for data value manipulations, the outcome of the players' past interactions are fed to long-short term memory (LSTM) blocks. Each player's LSTM block learns the expected spacing deviation resulting from its own action and feeds it to its RL algorithm. Then, the the attacker's RL algorithm chooses the action which maximizes the spacing deviation, while the AV's RL algorithm tries to find the optimal action that minimizes such deviation.

CRJan 19, 2018
Defense Against Advanced Persistent Threats in Dynamic Cloud Storage: A Colonel Blotto Game Approach

Minghui Min, Liang Xiao, Caixia Xie et al.

Advanced Persistent Threat (APT) attackers apply multiple sophisticated methods to continuously and stealthily steal information from the targeted cloud storage systems and can even induce the storage system to apply a specific defense strategy and attack it accordingly. In this paper, the interactions between an APT attacker and a defender allocating their Central Processing Units (CPUs) over multiple storage devices in a cloud storage system are formulated as a Colonel Blotto game. The Nash equilibria (NEs) of the CPU allocation game are derived for both symmetric and asymmetric CPUs between the APT attacker and the defender to evaluate how the limited CPU resources, the date storage size and the number of storage devices impact the expected data protection level and the utility of the cloud storage system. A CPU allocation scheme based on "hotbooting" policy hill-climbing (PHC) that exploits the experiences in similar scenarios to initialize the quality values to accelerate the learning speed is proposed for the defender to achieve the optimal APT defense performance in the dynamic game without being aware of the APT attack model and the data storage model. A hotbooting deep Q-network (DQN)-based CPU allocation scheme further improves the APT detection performance for the case with a large number of CPUs and storage devices. Simulation results show that our proposed reinforcement learning based CPU allocation can improve both the data protection level and the utility of the cloud storage system compared with the Q-learning based CPU allocation against APTs.

SYJul 14, 2017
Game Theory for Secure Critical Interdependent Gas-Power-Water Infrastructure

Aidin Ferdowsi, Anibal Sanjab, Walid Saad et al.

A city's critical infrastructure such as gas, water, and power systems, are largely interdependent since they share energy, computing, and communication resources. This, in turn, makes it challenging to endow them with fool-proof security solutions. In this paper, a unified model for interdependent gas-power-water infrastructure is presented and the security of this model is studied using a novel game-theoretic framework. In particular, a zero-sum noncooperative game is formulated between a malicious attacker who seeks to simultaneously alter the states of the gas-power-water critical infrastructure to increase the power generation cost and a defender who allocates communication resources over its attack detection filters in local areas to monitor the infrastructure. At the mixed strategy Nash equilibrium of this game, numerical results show that the expected power generation cost deviation is 35\% lower than the one resulting from an equal allocation of resources over the local filters. The results also show that, at equilibrium, the interdependence of the power system on the natural gas and water systems can motivate the attacker to target the states of the water and natural gas systems to change the operational states of the power grid. Conversely, the defender allocates a portion of its resources to the water and natural gas states of the interdependent system to protect the grid from state deviations.

GTMay 23, 2017
A Colonel Blotto Game for Interdependence-Aware Cyber-Physical Systems Security in Smart Cities

Aidin Ferdowsi, Walid Saad, Behrouz Maham et al.

Smart cities must integrate a number of interdependent cyber-physical systems that operate in a coordinated manner to improve the well-being of the city's residents. A cyber-physical system (CPS) is a system of computational elements controlling physical entities. Large-scale CPSs are more vulnerable to attacks due to the cyber-physical interdependencies that can lead to cascading failures which can have a significant detrimental effect on a city. In this paper, a novel approach is proposed for analyzing the problem of allocating security resources, such as firewalls and anti-malware, over the various cyber components of an interdependent CPS to protect the system against imminent attacks. The problem is formulated as a Colonel Blotto game in which the attacker seeks to allocate its resources to compromise the CPS, while the defender chooses how to distribute its resources to defend against potential attacks. To evaluate the effects of defense and attack, various CPS factors are considered including human-CPS interactions as well as physical and topological characteristics of a CPS such as flow and capacity of interconnections and minimum path algorithms. Results show that, for the case in which the attacker is not aware of the CPS interdependencies, the defender can have a higher payoff, compared to the case in which the attacker has complete information. The results also show that, in the case of more symmetric nodes, due to interdependencies, the defender achieves its highest payoff at the equilibrium compared to the case with independent, asymmetric nodes.