Sami Khairy

LG
h-index12
12papers
320citations
Novelty54%
AI Score45

12 Papers

QUANT-PHNov 29, 2022
Numerical evidence against advantage with quantum fidelity kernels on classical data

Lucas Slattery, Ruslan Shaydulin, Shouvanik Chakrabarti et al.

Quantum machine learning techniques are commonly considered one of the most promising candidates for demonstrating practical quantum advantage. In particular, quantum kernel methods have been demonstrated to be able to learn certain classically intractable functions efficiently if the kernel is well-aligned with the target function. In the more general case, quantum kernels are known to suffer from exponential "flattening" of the spectrum as the number of qubits grows, preventing generalization and necessitating the control of the inductive bias by hyperparameters. We show that the general-purpose hyperparameter tuning techniques proposed to improve the generalization of quantum kernels lead to the kernel becoming well-approximated by a classical kernel, removing the possibility of quantum advantage. We provide extensive numerical evidence for this phenomenon utilizing multiple previously studied quantum feature maps and both synthetic and real data. Our results show that unless novel techniques are developed to control the inductive bias of quantum kernels, they are unlikely to provide a quantum advantage on classical data.

NISep 23, 2023
Offline to Online Learning for Real-Time Bandwidth Estimation

Aashish Gottipati, Sami Khairy, Gabriel Mittag et al.

Real-time video applications require accurate bandwidth estimation (BWE) to maintain user experience across varying network conditions. However, increasing network heterogeneity challenges general-purpose BWE algorithms, necessitating solutions that adapt to end-user environments. While widely adopted, heuristic-based methods are difficult to individualize without extensive domain expertise. Conversely, online reinforcement learning (RL) offers ease of customization but neglects prior domain expertise and suffers from sample inefficiency. Thus, we present Merlin, an imitation learning-based solution that replaces the manual parameter tuning of heuristic-based methods with data-driven updates to streamline end-user personalization. Our key insight is that transforming heuristic-based BWE algorithms into neural networks facilitates data-driven personalization. Merlin utilizes Behavioral Cloning to efficiently learn from offline telemetry logs, capturing heuristic policies without live network interactions. The cloned policy can then be seamlessly tailored to end user network conditions through online finetuning. In real intercontinental videoconferencing calls, Merlin matches our heuristic's policy with no statistically significant differences in user quality of experience (QoE). Finetuning Merlin's control policy to end-user environments enables QoE improvements of up to 7.8% compared to the heuristic policy. Lastly, our IL-based design performs competitively with current state-of-the-art online RL techniques but converges with 80% fewer videoconferencing samples, facilitating practical end-user personalization.

LGJun 10, 2022
Multifidelity Reinforcement Learning with Control Variates

Sami Khairy, Prasanna Balaprakash

In many computational science and engineering applications, the output of a system of interest corresponding to a given input can be queried at different levels of fidelity with different costs. Typically, low-fidelity data is cheap and abundant, while high-fidelity data is expensive and scarce. In this work we study the reinforcement learning (RL) problem in the presence of multiple environments with different levels of fidelity for a given control task. We focus on improving the RL agent's performance with multifidelity data. Specifically, a multifidelity estimator that exploits the cross-correlations between the low- and high-fidelity returns is proposed to reduce the variance in the estimation of the state-action value function. The proposed estimator, which is based on the method of control variates, is used to design a multifidelity Monte Carlo RL (MFMCRL) algorithm that improves the learning of the agent in the high-fidelity environment. The impacts of variance reduction on policy evaluation and policy improvement are theoretically analyzed by using probability bounds. Our theoretical analysis and numerical experiments demonstrate that for a finite budget of high-fidelity data samples, our proposed MFMCRL agent attains superior performance compared with that of a standard RL agent that uses only the high-fidelity environment data for learning the optimal policy.

NIMar 23
Offline Meta-learning for Real-time Bandwidth Estimation

Aashish Gottipati, Sami Khairy, Yasaman Hosseinkashi et al.

Real-time video applications require dynamic bitrate adjustments based on network capacity, necessitating accurate bandwidth estimation (BWE). We introduce Ivy, a novel BWE method that leverages offline meta-learning to combat data drift and maximize user Quality of Experience (QoE). Our approach dynamically selects the most suitable BWE algorithm for current network conditions, enabling effective adaptation to changing environments without requiring live network interactions. We implemented our method in Microsoft Teams and demonstrated that Ivy can enhance QoE by 5.9% to 11.2% over individual BWE algorithms and by 6.3% to 11.4% compared to existing online meta heuristics. Additionally, we show that our method is more data efficient compared to online meta-learning methods, achieving up to 21% improvement in QoE while requiring significantly less training data.

SYJan 23, 2024
A Safe Reinforcement Learning Algorithm for Supervisory Control of Power Plants

Yixuan Sun, Sami Khairy, Richard B. Vilim et al.

Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment's dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. Our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design.

MMOct 14, 2025
Human-in-the-Loop Bandwidth Estimation for Quality of Experience Optimization in Real-Time Video Communication

Sami Khairy, Gabriel Mittag, Vishak Gopal et al.

The quality of experience (QoE) delivered by video conferencing systems is significantly influenced by accurately estimating the time-varying available bandwidth between the sender and receiver. Bandwidth estimation for real-time communications remains an open challenge due to rapidly evolving network architectures, increasingly complex protocol stacks, and the difficulty of defining QoE metrics that reliably improve user experience. In this work, we propose a deployed, human-in-the-loop, data-driven framework for bandwidth estimation to address these challenges. Our approach begins with training objective QoE reward models derived from subjective user evaluations to measure audio and video quality in real-time video conferencing systems. Subsequently, we collect roughly $1$M network traces with objective QoE rewards from real-world Microsoft Teams calls to curate a bandwidth estimation training dataset. We then introduce a novel distributional offline reinforcement learning (RL) algorithm to train a neural-network-based bandwidth estimator aimed at improving QoE for users. Our real-world A/B test demonstrates that the proposed approach reduces the subjective poor call ratio by $11.41\%$ compared to the baseline bandwidth estimator. Furthermore, the proposed offline RL algorithm is benchmarked on D4RL tasks to demonstrate its generalization beyond bandwidth estimation.

LGNov 11, 2024
Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC

Aditya Soni, Mayukh Das, Anjaly Parayil et al.

The difficulty of exploring and training online on real production systems limits the scope of real-time online data/feedback-driven decision making. The most feasible approach is to adopt offline reinforcement learning from limited trajectory samples. However, after deployment, such policies fail due to exogenous factors that temporarily or permanently disturb/alter the transition distribution of the assumed decision process structure induced by offline samples. This results in critical policy failures and generalization errors in sensitive domains like Real-Time Communication (RTC). We solve this crucial problem of identifying robust actions in presence of domain shifts due to unseen exogenous stochastic factors in the wild. As it is impossible to learn generalized offline policies within the support of offline data that are robust to these unseen exogenous disturbances, we propose a novel post-deployment shaping of policies (Streetwise), conditioned on real-time characterization of out-of-distribution sub-spaces. This leads to robust actions in bandwidth estimation (BWE) of network bottlenecks in RTC and in standard benchmarks. Our extensive experimental results on BWE and other standard offline RL benchmark environments demonstrate a significant improvement ($\approx$ 18% on some scenarios) in final returns wrt. end-user metrics over state-of-the-art baselines.

NIJan 2, 2021
Data-Driven Random Access Optimization in Multi-Cell IoT Networks with NOMA

Sami Khairy, Prasanna Balaprakash, Lin X. Cai et al.

Non-orthogonal multiple access (NOMA) is a key technology to enable massive machine type communications (mMTC) in 5G networks and beyond. In this paper, NOMA is applied to improve the random access efficiency in high-density spatially-distributed multi-cell wireless IoT networks, where IoT devices contend for accessing the shared wireless channel using an adaptive p-persistent slotted Aloha protocol. To enable a capacity-optimal network, a novel formulation of random channel access management is proposed, in which the transmission probability of each IoT device is tuned to maximize the geometric mean of users' expected capacity. It is shown that the network optimization objective is high dimensional and mathematically intractable, yet it admits favourable mathematical properties that enable the design of efficient data-driven algorithmic solutions which do not require a priori knowledge of the channel model or network topology. A centralized model-based algorithm and a scalable distributed model-free algorithm, are proposed to optimally tune the transmission probabilities of IoT devices and attain the maximum capacity. The convergence of the proposed algorithms to the optimal solution is further established based on convex optimization and game-theoretic analysis. Extensive simulations demonstrate the merits of the novel formulation and the efficacy of the proposed algorithms.

LGMay 7, 2020
A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

Sami Khairy, Prasanna Balaprakash, Lin X. Cai

The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs constraints, is based on convex linear programming. In this brief, we first prove that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function (PWLC) with respect to the Lagrange penalty multipliers. Next, we propose a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP. The proposed algorithm is applied in two stochastic control problems with constraints: robot navigation in a grid world and solar-powered unmanned aerial vehicle (UAV)-based wireless network management. We empirically compare the convergence performance of the proposed GAS algorithm with binary search (BS), Lagrangian primal-dual optimization (PDO), and Linear Programming (LP). Compared with benchmark algorithms, it is shown that the proposed GAS algorithm converges to the optimal solution faster, does not require hyper-parameter tuning, and is not sensitive to initialization of the Lagrange penalty multiplier.

NIJan 31, 2020
Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV based Random Access IoT Networks with NOMA

Sami Khairy, Prasanna Balaprakash, Lin X. Cai et al.

In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique to improve the massive channel access of a wireless IoT network where solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to remote servers. Specifically, IoT devices contend for accessing the shared wireless channel using an adaptive $p$-persistent slotted Aloha protocol; and the solar-powered UAVs adopt Successive Interference Cancellation (SIC) to decode multiple received data from IoT devices to improve access efficiency. To enable an energy-sustainable capacity-optimal network, we study the joint problem of dynamic multi-UAV altitude control and multi-cell wireless channel access management of IoT devices as a stochastic control problem with multiple energy constraints. To learn an optimal control policy, we first formulate this problem as a Constrained Markov Decision Process (CMDP), and propose an online model-free Constrained Deep Reinforcement Learning (CDRL) algorithm based on Lagrangian primal-dual policy optimization to solve the CMDP. Extensive simulations demonstrate that our proposed algorithm learns a cooperative policy among UAVs in which the altitude of UAVs and channel access probability of IoT devices are dynamically and jointly controlled to attain the maximal long-term network capacity while maintaining energy sustainability of UAVs. The proposed algorithm outperforms Deep RL based solutions with reward shaping to account for energy costs, and achieves a temporal average system capacity which is $82.4\%$ higher than that of a feasible DRL based solution, and only $6.47\%$ lower compared to that of the energy-constraint-free system.

LGNov 25, 2019
Learning to Optimize Variational Quantum Circuits to Solve Combinatorial Problems

Sami Khairy, Ruslan Shaydulin, Lukasz Cincio et al.

Quantum computing is a computational paradigm with the potential to outperform classical methods for a variety of problems. Proposed recently, the Quantum Approximate Optimization Algorithm (QAOA) is considered as one of the leading candidates for demonstrating quantum advantage in the near term. QAOA is a variational hybrid quantum-classical algorithm for approximately solving combinatorial optimization problems. The quality of the solution obtained by QAOA for a given problem instance depends on the performance of the classical optimizer used to optimize the variational parameters. In this paper, we formulate the problem of finding optimal QAOA parameters as a learning task in which the knowledge gained from solving training instances can be leveraged to find high-quality solutions for unseen test instances. To this end, we develop two machine-learning-based approaches. Our first approach adopts a reinforcement learning (RL) framework to learn a policy network to optimize QAOA circuits. Our second approach adopts a kernel density estimation (KDE) technique to learn a generative model of optimal QAOA parameters. In both approaches, the training procedure is performed on small-sized problem instances that can be simulated on a classical computer; yet the learned RL policy and the generative model can be used to efficiently solve larger problems. Extensive simulations using the IBM Qiskit Aer quantum circuit simulator demonstrate that our proposed RL- and KDE-based approaches reduce the optimality gap by factors up to 30.15 when compared with other commonly used off-the-shelf optimizers.

LGNov 11, 2019
Reinforcement-Learning-Based Variational Quantum Circuits Optimization for Combinatorial Problems

Sami Khairy, Ruslan Shaydulin, Lukasz Cincio et al.

Quantum computing exploits basic quantum phenomena such as state superposition and entanglement to perform computations. The Quantum Approximate Optimization Algorithm (QAOA) is arguably one of the leading quantum algorithms that can outperform classical state-of-the-art methods in the near term. QAOA is a hybrid quantum-classical algorithm that combines a parameterized quantum state evolution with a classical optimization routine to approximately solve combinatorial problems. The quality of the solution obtained by QAOA within a fixed budget of calls to the quantum computer depends on the performance of the classical optimization routine used to optimize the variational parameters. In this work, we propose an approach based on reinforcement learning (RL) to train a policy network that can be used to quickly find high-quality variational parameters for unseen combinatorial problem instances. The RL agent is trained on small problem instances which can be simulated on a classical computer, yet the learned RL policy is generalizable and can be used to efficiently solve larger instances. Extensive simulations using the IBM Qiskit Aer quantum circuit simulator demonstrate that our trained RL policy can reduce the optimality gap by a factor up to 8.61 compared with other off-the-shelf optimizers tested.