SYNov 28, 2017
Optimal Dynamic Sensor Subset Selection for Tracking a Time-Varying Stochastic ProcessArpan Chattopadhyay, Urbashi Mitra
Motivated by the Internet-of-things and sensor networks for cyberphysical systems, the problem of dynamic sensor activation for the tracking of a time-varying process is examined. The tradeoff is between energy efficiency, which decreases with the number of active sensors, and fidelity, which increases with the number of active sensors. The problem of minimizing the time-averaged mean-squared error over infinite horizon is examined under the constraint of the mean number of active sensors. The proposed methods artfully combine three key ingredients: Gibbs sampling, stochastic approximation for learning, and modifications to consensus algorithms to create a high performance, energy efficient tracking mechanisms with active sensor selection. The following progression of scenarios are considered: centralized tracking of an i.i.d. process; distributed tracking of an i.i.d. process and finally distributed tracking of a Markov chain. The challenge of the i.i.d. case is that the process has a distribution parameterized by a known or unknown parameter which must be learned. The key theoretical results prove that the proposed algorithms converge to local optima for the two i.i.d process cases; numerical results suggest that global optimality is in fact achieved. The proposed distributed tracking algorithm for a Markov chain, based on Kalman-consensus filtering and stochastic approximation, is seen to offer an error performance comparable to that of a competetive centralized Kalman filter.
LGMar 16, 2023
Online Reinforcement Learning in Periodic MDPAyush Aniket, Arpan Chattopadhyay
We study learning in periodic Markov Decision Process (MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period $N$ and as $\mathcal{O}(\sqrt{Tlog T})$ with the horizon length $T$. Utilizing the information about the sparsity of transition matrix of augmented MDP, we propose another algorithm PUCRLB which enhances upon PUCRL2, both in terms of regret ($O(\sqrt{N})$ dependency on period) and empirical performance. Finally, we propose two other algorithms U-PUCRL2 and U-PUCRLB for extended uncertainty in the environment in which the period is unknown but a set of candidate periods are known. Numerical results demonstrate the efficacy of all the algorithms.
LGJul 25, 2022
Online Reinforcement Learning for Periodic MDPAyush Aniket, Arpan Chattopadhyay
We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period and as sub-linear with the horizon length. Numerical results demonstrate the efficacy of PUCRL2.
LGMay 14, 2023
Inverse Reinforcement Learning With Constraint RecoveryNirjhar Das, Arpan Chattopadhyay
In this work, we propose a novel inverse reinforcement learning (IRL) algorithm for constrained Markov decision process (CMDP) problems. In standard IRL problems, the inverse learner or agent seeks to recover the reward function of the MDP, given a set of trajectory demonstrations for the optimal policy. In this work, we seek to infer not only the reward functions of the CMDP, but also the constraints. Using the principle of maximum entropy, we show that the IRL with constraint recovery (IRL-CR) problem can be cast as a constrained non-convex optimization problem. We reduce it to an alternating constrained optimization problem whose sub-problems are convex. We use exponentiated gradient descent algorithm to solve it. Finally, we demonstrate the efficacy of our algorithm for the grid world environment.
OCJan 5, 2022
Inverse Extended Kalman Filter -- Part I: FundamentalsHimali Singh, Arpan Chattopadhyay, Kumar Vijay Mishra
Recent advances in counter-adversarial systems have garnered significant research attention to inverse filtering from a Bayesian perspective. For example, interest in estimating the adversary's Kalman filter tracked estimate with the purpose of predicting the adversary's future steps has led to recent formulations of inverse Kalman filter (I-KF). In this context of inverse filtering, we address the key challenges of non-linear process dynamics and unknown input to the forward filter by proposing an inverse extended Kalman filter (I-EKF). The purpose of this paper and the companion paper (Part II) is to develop the theory of I-EKF in detail. In this paper, we assume perfect system model information and derive I-EKF with and without an unknown input when both forward and inverse state-space models are non-linear. In the process, I-KF-with-unknown-input is also obtained. We then provide theoretical stability guarantees using both bounded non-linearity and unknown matrix approaches and prove the I-EKF's consistency. Numerical experiments validate our methods for various proposed inverse filters using the recursive Cramér-Rao lower bound as a benchmark. In the companion paper (Part II), we further generalize these formulations to highly non-linear models and propose reproducing kernel Hilbert space-based EKF to handle incomplete system model information.
NIMay 24, 2021
A Low-Delay MAC for IoT Applications: Decentralized Optimal Scheduling of Queues without Explicit State Information SharingAvinash Mohan, Arpan Chattopadhyay, Shivam Vinayak Vatsa et al.
We consider a system of several collocated nodes sharing a time slotted wireless channel, and seek a MAC (medium access control) that (i) provides low mean delay, (ii) has distributed control (i.e., there is no central scheduler), and (iii) does not require explicit exchange of state information or control signals. The design of such MAC protocols must keep in mind the need for contention access at light traffic, and scheduled access in heavy traffic, leading to the long-standing interest in hybrid, adaptive MACs. Working in the discrete time setting, for the distributed MAC design, we consider a practical information structure where each node has local information and some common information obtained from overhearing. In this setting, "ZMAC" is an existing protocol that is hybrid and adaptive. We approach the problem via two steps (1) We show that it is sufficient for the policy to be "greedy" and "exhaustive". Limiting the policy to this class reduces the problem to obtaining a queue switching policy at queue emptiness instants. (2) Formulating the delay optimal scheduling as a POMDP (partially observed Markov decision process), we show that the optimal switching rule is Stochastic Largest Queue (SLQ). Using this theory as the basis, we then develop a practical distributed scheduler, QZMAC, which is also tunable. We implement QZMAC on standard off-the-shelf TelosB motes and also use simulations to compare QZMAC with the full-knowledge centralized scheduler, and with ZMAC. We use our implementation to study the impact of false detection while overhearing the common information, and the efficiency of QZMAC. Our simulation results show that the mean delay with QZMAC is close that of the full-knowledge centralized scheduler.
LGOct 30, 2020
Centralized active tracking of a Markov chain with unknown dynamicsMrigank Raman, Ojal Kumar, Arpan Chattopadhyay
In this paper, selection of an active sensor subset for tracking a discrete time, finite state Markov chain having an unknown transition probability matrix (TPM) is considered. A total of N sensors are available for making observations of the Markov chain, out of which a subset of sensors are activated each time in order to perform reliable estimation of the process. The trade-off is between activating more sensors to gather more observations for the remote estimation, and restricting sensor usage in order to save energy and bandwidth consumption. The problem is formulated as a constrained minimization problem, where the objective is the long-run averaged mean-squared error (MSE) in estimation, and the constraint is on sensor activation rate. A Lagrangian relaxation of the problem is solved by an artful blending of two tools: Gibbs sampling for MSE minimization and an on-line version of expectation maximization (EM) to estimate the unknown TPM. Finally, the Lagrange multiplier is updated using slower timescale stochastic approximation in order to satisfy the sensor activation rate constraint. The on-line EM algorithm, though adapted from literature, can estimate vector-valued parameters even under time-varying dimension of the sensor observations. Numerical results demonstrate approximately 1 dB better error performance than uniform sensor sampling and comparable error performance (within 2 dB bound) against complete sensor observation. This makes the proposed algorithm amenable to practical implementation.
IVJul 9, 2020
Efficient detection of adversarial imagesDarpan Kumar Yadav, Kartik Mundra, Rahul Modpur et al.
In this paper, detection of deception attack on deep neural network (DNN) based image classification in autonomous and cyber-physical systems is considered. Several studies have shown the vulnerability of DNN to malicious deception attacks. In such attacks, some or all pixel values of an image are modified by an external attacker, so that the change is almost invisible to the human eye but significant enough for a DNN-based classifier to misclassify it. This paper first proposes a novel pre-processing technique that facilitates the detection of such modified images under any DNN-based image classifier as well as the attacker model. The proposed pre-processing algorithm involves a certain combination of principal component analysis (PCA)-based decomposition of the image, and random perturbation based detection to reduce computational complexity. Next, an adaptive version of this algorithm is proposed where a random number of perturbations are chosen adaptively using a doubly-threshold policy, and the threshold values are learnt via stochastic approximation in order to minimize the expected number of perturbations subject to constraints on the false alarm and missed detection probabilities. Numerical experiments show that the proposed detection scheme outperforms a competing algorithm while achieving reasonably low computational complexity.
CRJul 31, 2018
Security against false data injection attack in cyber-physical systemsArpan Chattopadhyay, Urbashi Mitra
In this paper, secure, remote estimation of a linear Gaussian process via observations at multiple sensors is considered. Such a framework is relevant to many cyber-physical systems and internet-of-things applications. Sensors make sequential measurements that are shared with a fusion center; the fusion center applies a certain filtering algorithm to make its estimates. The challenge is the presence of a few unknown malicious sensors which can inject anomalous observations to skew the estimates at the fusion center. The set of malicious sensors may be time-varying. The problems of malicious sensor detection and secure estimation are considered. First, an algorithm for secure estimation is proposed. The proposed estimation scheme uses a novel filtering and learning algorithm, where an optimal filter is learnt over time by using the sensor observations in order to filter out malicious sensor observations while retaining other sensor measurements. Next, a novel detector to detect injection attacks on an unknown sensor subset is developed. Numerical results demonstrate up to 3 dB gain in the mean squared error and up to 75% higher attack detection probability under a small false alarm rate constraint, against a competing algorithm that requires additional side information.
ITSep 11, 2017
Optimal Sensing and Data Estimation in a Large Sensor NetworkArpan Chattopadhyay, Urbashi Mitra
An energy efficient use of large scale sensor networks necessitates activating a subset of possible sensors for estimation at a fusion center. The problem is inherently combinatorial; to this end, a set of iterative, randomized algorithms are developed for sensor subset selection by exploiting the underlying statistics. Gibbs sampling-based methods are designed to optimize the estimation error and the mean number of activated sensors. The optimality of the proposed strategy is proven, along with guarantees on their convergence speeds. Also, another new algorithm exploiting stochastic approximation in conjunction with Gibbs sampling is derived for a constrained version of the sensor selection problem. The methodology is extended to the scenario where the fusion center has access to only a parametric form of the joint statistics, but not the true underlying distribution. Therein, expectation-maximization is effectively employed to learn the distribution. Strategies for iid time-varying data are also outlined. Numerical results show that the proposed methods converge very fast to the respective optimal solutions, and therefore can be employed for optimal sensor subset selection in practical sensor networks.