Urbashi Mitra

LG
h-index5
19papers
189citations
Novelty50%
AI Score47

19 Papers

SYSep 20, 2010
Optimization of ARQ Protocols in Interference Networks with QoS Constraints

Marco Levorato, Daniel O'Neill, Andrea Goldsmith et al.

We study optimal transmission strategies in interfering wireless networks, under Quality of Service constraints. A buffered, dynamic network with multiple sources is considered, and sources use a retransmission strategy in order to improve packet delivery probability. The optimization problem is formulated as a Markov Decision Process, where constraints and objective functions are ratios of time-averaged cost functions. The optimal strategy is found as the solution of a Linear Fractional Program, where the optimization variables are the steady-state probability of state-action pairs. Numerical results illustrate the dependence of optimal transmission/interference strategies on the constraints imposed on the network.

ITMay 6, 2022
UAV-aided RF Mapping for Sensing and Connectivity in Wireless Networks

David Gesbert, Omid Esrafilian, Junting Chen et al.

The use of unmanned aerial vehicles (UAV) as flying radio access network (RAN) nodes offers a promising complement to traditional fixed terrestrial deployments. More recently yet still in the context of wireless networks, drones have also been envisioned for use as radio frequency (RF) sensing and localization devices. In both cases, the advantage of using UAVs lies in their ability to navigate themselves freely in 3D and in a timely manner to locations of space where the obtained network throughput or sensing performance is optimal. In practice, the selection of a proper location or trajectory for the UAV very much depends on local terrain features, including the position of surrounding radio obstacles. Hence, the robot must be able to map the features of its radio environment as it performs its data communication or sensing services. The challenges related to this task, referred here as radio mapping, are discussed in this paper. Its promises related to efficient trajectory design for autonomous radio-aware UAVs are highlighted, along with algorithm solutions. The advantages induced by radio-mapping in terms of connectivity, sensing, and localization performance are illustrated.

ITMay 5, 2022
Uncertainty-Based Non-Parametric Active Peak Detection

Praneeth Narayanamurthy, Urbashi Mitra

Active, non-parametric peak detection is considered. As a use case, active source localization is examined and an uncertainty-based sampling scheme algorithm to effectively localize the peak from a few energy measurements is designed. It is shown that under very mild conditions, the source localization error with $m$ actively chosen energy measurements scales as $O(\log^2 m/m)$. Numerically, it is shown that in low-sample regimes, the proposed method enjoys superior performance on several types of data and outperforms the state-of-the-art passive source localization approaches and in the low sample regime, can outperform greedy methods as well.

LGAug 20, 2024
Asymmetric Graph Error Control with Low Complexity in Causal Bandits

Chen Peng, Di Zhang, Urbashi Mitra

In this paper, the causal bandit problem is investigated, with the objective of maximizing the long-term reward by selecting an optimal sequence of interventions on nodes in an unknown causal graph. It is assumed that both the causal topology and the distribution of interventions are unknown. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed. Numerical results suggest that this method has a much lower sample complexity relative to the prior art by learning sub-graphs. However, we note that a sample complexity analysis for the new algorithm has not been undertaken, as of yet. Under the assumption of minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound-based intervention selection to optimize the reward. Further, we consider a particular instance of non-stationary bandits wherein both the causal topology and interventional distributions can change. Our solution is the design of a sub-graph change detection mechanism that requires a modest number of samples. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in stationary and non-stationary settings. Averaged over 100 randomly generated causal bandits, the proposed scheme takes significantly fewer samples to learn the causal structure and achieves a reward gain of 85% compared to existing approaches.

SPAug 29, 2024
Coverage Analysis of Multi-Environment Q-Learning Algorithms for Wireless Network Optimization

Talha Bozkus, Urbashi Mitra

Q-learning is widely used to optimize wireless networks with unknown system dynamics. Recent advancements include ensemble multi-environment hybrid Q-learning algorithms, which utilize multiple Q-learning algorithms across structurally related but distinct Markovian environments and outperform existing Q-learning algorithms in terms of accuracy and complexity in large-scale wireless networks. We herein conduct a comprehensive coverage analysis to ensure optimal data coverage conditions for these algorithms. Initially, we establish upper bounds on the expectation and variance of different coverage coefficients. Leveraging these bounds, we present an algorithm for efficient initialization of these algorithms. We test our algorithm on two distinct real-world wireless networks. Numerical simulations show that our algorithm can achieve %50 less policy error and %40 less runtime complexity than state-of-the-art reinforcement learning algorithms. Furthermore, our algorithm exhibits robustness to changes in network settings and parameters. We also numerically validate our theoretical results.

SPSep 24, 2024
A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network Optimization

Talha Bozkus, Urbashi Mitra

Q-learning is a powerful tool for network control and policy optimization in wireless networks, but it struggles with large state spaces. Recent advancements, like multi-environment mixed Q-learning (MEMQ), improves performance and reduces complexity by integrating multiple Q-learning algorithms across multiple related environments so-called digital cousins. However, MEMQ is designed for centralized single-agent networks and is not suitable for decentralized or multi-agent networks. To address this challenge, we propose a novel multi-agent MEMQ algorithm for partially decentralized wireless networks with multiple mobile transmitters (TXs) and base stations (BSs), where TXs do not have access to each other's states and actions. In uncoordinated states, TXs act independently to minimize their individual costs. In coordinated states, TXs use a Bayesian approach to estimate the joint state based on local observations and share limited information with leader TX to minimize joint cost. The cost of information sharing scales linearly with the number of TXs and is independent of the joint state-action space size. The proposed scheme is 50% faster than centralized MEMQ with only a 20% increase in average policy error (APE) and is 25% faster than several advanced decentralized Q-learning algorithms with 40% less APE. The convergence of the algorithm is also demonstrated.

ITMay 3
Atomic Hybrid Sparse/Diffuse Channel Estimation and Cramér-Rao Bounds Analysis

Lei Lyu, Maxime Ferreira Da Costa, Urbashi Mitra

In this paper, an atomic hybrid sparse/diffuse (aHSD) channel model in the frequency domain is proposed. Based on a structural analysis of the resolvable paths and diffuse scattering statistics, the Hybrid Atomic-Least-Squares (HALS) algorithm is designed to estimate sparse/diffuse components with a combined atomic and $\ell_2$ regularization. A theoretical analysis of the Lagrange dual problem is conducted, and the conditions required for primal and dual solutions are provided, supporting an off-the-grid delay-time estimator. The Cramér--Rao Bound (CRB) analysis in this paper focuses on the estimation of the channel parameters, resulting in a bound on the aggregate channel. Lower and upper bounds for the CRB on parameters are derived as functions of the minimum separations between frequency parameters. Numerical results via simulations on synthetic and real data validate the efficacy of the HALS estimation strategy and show the improved predictive ability of the CRB analysis for the performance of HALS versus previously considered bounds.

LGFeb 23
$κ$-Explorer: A Unified Framework for Active Model Estimation in MDPs

Xihe Gu, Urbashi Mitra, Tara Javidi

In tabular Markov decision processes (MDPs) with perfect state observability, each trajectory provides active samples from the transition distributions conditioned on state-action pairs. Consequently, accurate model estimation depends on how the exploration policy allocates visitation frequencies in accordance with the intrinsic complexity of each transition distribution. Building on recent work on coverage-based exploration, we introduce a parameterized family of decomposable and concave objective functions $U_κ$ that explicitly incorporate both intrinsic estimation complexity and extrinsic visitation frequency. Moreover, the curvature $κ$ provides a unified treatment of various global objectives, such as the average-case and worst-case estimation error objectives. Using the closed-form characterization of the gradient of $U_κ$, we propose $κ$-Explorer, an active exploration algorithm that performs Frank-Wolfe-style optimization over state-action occupancy measures. The diminishing-returns structure of $U_κ$ naturally prioritizes underexplored and high-variance transitions, while preserving smoothness properties that enable efficient optimization. We establish tight regret guarantees for $κ$-Explorer and further introduce a fully online and computationally efficient surrogate algorithm for practical use. Experiments on benchmark MDPs demonstrate that $κ$-Explorer provides superior performance compared to existing exploration strategies.

LGFeb 8, 2024
Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization

Talha Bozkus, Urbashi Mitra

Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original Q-learning suffers from performance and complexity challenges across very large networks. Herein, a novel model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges for networks which admit Markov decision process (MDP) models. Multiple Q-learning algorithms are run on multiple, distinct, synthetically created and structurally related Markovian environments in parallel; the outputs are fused using an adaptive weighting mechanism based on the Jensen-Shannon divergence (JSD) to obtain an approximately optimal policy with low complexity. The theoretical justification of the algorithm, including the convergence of key statistics and Q-functions are provided. Numerical results across several network models show that the proposed algorithm can achieve up to 55% less average policy error with up to 50% less runtime complexity than the state-of-the-art Q-learning algorithms. Numerical results validate assumptions made in the theoretical analysis.

LGFeb 12, 2024
Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

Talha Bozkus, Urbashi Mitra

Optimizing large-scale wireless networks, including optimal resource management, power allocation, and throughput maximization, is inherently challenging due to their non-observable system dynamics and heterogeneous and complex nature. Herein, a novel ensemble Q-learning algorithm that addresses the performance and complexity challenges of the traditional Q-learning algorithm for optimizing wireless networks is presented. Ensemble learning with synthetic Markov Decision Processes is tailored to wireless networks via new models for approximating large state-space observable wireless networks. In particular, digital cousins are proposed as an extension of the traditional digital twin concept wherein multiple Q-learning algorithms on multiple synthetic Markovian environments are run in parallel and their outputs are fused into a single Q-function. Convergence analyses of key statistics and Q-functions and derivations of upper bounds on the estimation bias and variance are provided. Numerical results across a variety of real-world wireless networks show that the proposed algorithm can achieve up to 50% less average policy error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms. It is also shown that theoretical results properly predict trends in the experimental results.

LGNov 13, 2024
Coverage Analysis for Digital Cousin Selection -- Improving Multi-Environment Q-Learning

Talha Bozkus, Tara Javidi, Urbashi Mitra

Q-learning is widely employed for optimizing various large-dimensional networks with unknown system dynamics. Recent advancements include multi-environment mixed Q-learning (MEMQ) algorithms, which utilize multiple independent Q-learning algorithms across multiple, structurally related but distinct environments and outperform several state-of-the-art Q-learning algorithms in terms of accuracy, complexity, and robustness. We herein conduct a comprehensive probabilistic coverage analysis to ensure optimal data coverage conditions for MEMQ algorithms. First, we derive upper and lower bounds on the expectation and variance of different coverage coefficients (CC) for MEMQ algorithms. Leveraging these bounds, we develop a simple way of comparing the utilities of multiple environments in MEMQ algorithms. This approach appears to be near optimal versus our previously proposed partial ordering approach. We also present a novel CC-based MEMQ algorithm to improve the accuracy and complexity of existing MEMQ algorithms. Numerical experiments are conducted using random network graphs with four different graph properties. Our algorithm can reduce the average policy error (APE) by 65% compared to partial ordering and is 95% faster than the exhaustive search. It also achieves 60% less APE than several state-of-the-art reinforcement learning and prior MEMQ algorithms. Additionally, we numerically verify the theoretical results and show their scalability with the action-space size.

LGJul 26, 2025
ModShift: Model Privacy via Designed Shifts

Nomaan A. Kherani, Urbashi Mitra

In this paper, shifts are introduced to preserve model privacy against an eavesdropper in federated learning. Model learning is treated as a parameter estimation problem. This perspective allows us to derive the Fisher Information matrix of the model updates from the shifted updates and drive them to singularity, thus posing a hard estimation problem for Eve. The shifts are securely shared with the central server to maintain model accuracy at the server and participating devices. A convergence test is proposed to detect if model updates have been tampered with and we show that our scheme passes this test. Numerical results show that our scheme achieves a higher model shift when compared to a noise injection scheme while requiring a lesser bandwidth secret channel.

LGMar 7, 2025
Partially Decentralized Multi-Agent Q-Learning via Digital Cousins for Wireless Networks

Talha Bozkus, Urbashi Mitra

Q-learning is a widely used reinforcement learning (RL) algorithm for optimizing wireless networks, but faces challenges with large state-spaces. Recently proposed multi-environment mixed Q-learning (MEMQ) algorithm addresses these challenges by employing multiple Q-learning algorithms across multiple synthetically generated, distinct but structurally related environments, so-called digital cousins. In this paper, we propose a novel multi-agent MEMQ (M-MEMQ) for cooperative decentralized wireless networks with multiple networked transmitters (TXs) and base stations (BSs). TXs do not have access to global information (joint state and actions). The new concept of coordinated and uncoordinated states is introduced. In uncoordinated states, TXs act independently to minimize their individual costs and update local Q-functions. In coordinated states, TXs use a Bayesian approach to estimate the joint state and update the joint Q-functions. The cost of information-sharing scales linearly with the number of TXs and is independent of the joint state-action space size. Several theoretical guarantees, including deterministic and probabilistic convergence, bounds on estimation error variance, and the probability of misdetecting the joint states, are given. Numerical simulations show that M-MEMQ outperforms several decentralized and centralized training with decentralized execution (CTDE) multi-agent RL algorithms by achieving 60% lower average policy error (APE), 40% faster convergence, 45% reduced runtime complexity, and 40% less sample complexity. Furthermore, M-MEMQ achieves comparable APE with significantly lower complexity than centralized methods. Simulations validate the theoretical analyses.

MEDec 8, 2020
Adaptive Sampling for Estimating Distributions: A Bayesian Upper Confidence Bound Approach

Dhruva Kartik, Neeraj Sood, Urbashi Mitra et al.

The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existing approaches. The posterior distribution on the pmfs in the Bayesian setting allows for a tighter computation of upper confidence bounds which leads to significant performance gains in practice. Using this approach, adaptive sampling protocols are proposed for estimating SARS-CoV-2 seroprevalence in various groups such as location and ethnicity. The effectiveness of this strategy is discussed using data obtained from a seroprevalence survey in Los Angeles county.

SYDec 5, 2019
Data-driven sensor scheduling for remote estimation in wireless networks

Marcos M. Vasconcelos, Urbashi Mitra

Sensor scheduling is a well studied problem in signal processing and control with numerous applications. Despite its successful history, most of the related literature assumes the knowledge of the underlying probabilistic model of the sensor measurements such as the correlation structure or the entire joint probability density function. Herein, a framework for sensor scheduling for remote estimation is introduced in which the system design and the scheduling decisions are based solely on observed data. Unicast and broadcast networks and corresponding receivers are considered. In both cases, the empirical risk minimization can be posed as a difference-of-convex optimization problem and locally optimal solutions are obtained efficiently by applying the convex-concave procedure. Our results are independent of the data's probability density function, correlation structure and the number of sensors.

MLDec 4, 2018
Sequential Experiment Design for Hypothesis Verification

Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis testing and verification problems is established. The verification problem can be formulated as a Markov Decision Process (MDP). Optimal solutions for the verification MDP are characterized and a simple heuristic adaptive strategy for verification is proposed based on a zero-sum game interpretation of Kullback-Leibler divergences. It is demonstrated through numerical experiments that the heuristic performs better in some scenarios compared to existing methods in literature.

ITOct 11, 2018
Policy Design for Active Sequential Hypothesis Testing using Deep Learning

Dhruva Kartik, Ekraam Sabir, Urbashi Mitra et al.

Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.

CRJul 31, 2018
Security against false data injection attack in cyber-physical systems

Arpan Chattopadhyay, Urbashi Mitra

In this paper, secure, remote estimation of a linear Gaussian process via observations at multiple sensors is considered. Such a framework is relevant to many cyber-physical systems and internet-of-things applications. Sensors make sequential measurements that are shared with a fusion center; the fusion center applies a certain filtering algorithm to make its estimates. The challenge is the presence of a few unknown malicious sensors which can inject anomalous observations to skew the estimates at the fusion center. The set of malicious sensors may be time-varying. The problems of malicious sensor detection and secure estimation are considered. First, an algorithm for secure estimation is proposed. The proposed estimation scheme uses a novel filtering and learning algorithm, where an optimal filter is learnt over time by using the sensor observations in order to filter out malicious sensor observations while retaining other sensor measurements. Next, a novel detector to detect injection attacks on an unknown sensor subset is developed. Numerical results demonstrate up to 3 dB gain in the mean squared error and up to 75% higher attack detection probability under a small false alarm rate constraint, against a competing algorithm that requires additional side information.

ITFeb 20, 2016
Power-Distortion Metrics for Path Planning over Gaussian Sensor Networks

Emrah Akyol, Urbashi Mitra

Path planning is an important component of au- tonomous mobile sensing systems. This paper studies upper and lower bounds of communication performance over Gaussian sen- sor networks, to drive power-distortion metrics for path planning problems. The Gaussian multiple-access channel is employed as a channel model and two source models are considered. In the first setting, the underlying source is estimated with minimum mean squared error, while in the second, reconstruction of a random spatial field is considered. For both problem settings, the upper and the lower bounds of sensor power-distortion curve are derived. For both settings, the upper bounds follow from the amplify-and-forward scheme and the lower bounds admit a unified derivation based on data processing inequality and tensorization property of the maximal correlation measure. Next, closed-form solutions of the optimal power allocation problems are obtained under a weighted sum-power constraint. The gap between the upper and the lower bounds is analyzed for both weighted sum and individual power constrained settings. Finally, these metrics are used to drive a path planning algorithm and the effects of power-distortion metrics, network parameters, and power optimization on the optimized path selection are analyzed.