Ahmed Khalil

LG
4papers
183citations
Novelty46%
AI Score40

4 Papers

LGOct 13, 2023
LL-VQ-VAE: Learnable Lattice Vector-Quantization For Efficient Representations

Ahmed Khalil, Robert Piechocki, Raul Santos-Rodriguez

In this paper we introduce learnable lattice vector quantization and demonstrate its effectiveness for learning discrete representations. Our method, termed LL-VQ-VAE, replaces the vector quantization layer in VQ-VAE with lattice-based discretization. The learnable lattice imposes a structure over all discrete embeddings, acting as a deterrent against codebook collapse, leading to high codebook utilization. Compared to VQ-VAE, our method obtains lower reconstruction errors under the same training conditions, trains in a fraction of the time, and with a constant number of parameters (equal to the embedding dimension $D$), making it a very scalable approach. We demonstrate these results on the FFHQ-1024 dataset and include FashionMNIST and Celeb-A.

LGSep 8, 2022
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Taku Yamagata, Ahmed Khalil, Raul Santos-Rodriguez

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. The Decision Transformer (DT) combines the conditional policy approach and a transformer architecture, showing competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy from sub-optimal trajectories. This issue becomes particularly significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not have the same limitation; however, they suffer from unstable learning behaviours, especially when they rely on function approximation in an off-policy learning setting. In this paper, we propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT by leveraging the benefits of Dynamic Programming (Q-learning). It utilises the Dynamic Programming results to relabel the return-to-go in the training data to then train the DT with the relabelled data. Our approach efficiently exploits the benefits of these two approaches and compensates for each other's shortcomings to achieve better performance. We empirically show these in both simple toy environments and the more complex D4RL benchmark, showing competitive performance gains.

60.1OCMar 24
Ensemble Kalman Inversion for Constrained Nonlinear MPC: An ADMM-Splitting Approach

Ahmed Khalil, Mohamed Safwat, Efstathios Bakolas

This work proposes a novel Alternating Direction Method of Multipliers (ADMM)-based Ensemble Kalman Inversion (EKI) algorithm for solving constrained nonlinear model predictive control (NMPC) problems. First, stage-wise nonlinear inequality constraints in the NMPC problem are embedded via an augmented Lagrangian with nonnegative slack variables. We then show that the resulting unconstrained augmented-Lagrangian primal subproblem admits a Bayesian interpretation: under independent Gaussian virtual observations, its minimizers coincide with MAP estimators, enabling solution via EKI. However, since the nonnegativity constraint on the slacks is a hard constraint not naturally encoded by a Gaussian model, our proposed algorithm yields a two-block ADMM scheme that alternates between (i) an inexact primal step that minimizes the augmented-Lagrangian objective (implemented via EKI rollouts), (ii) a nonnegativity projection for the slacks, and (iii) a dual ascent step. To balance exploration and convergence, an annealing schedule tempers sampling covariances while a penalty schedule increases constraint enforcement over outer iterations, encouraging global search early and precise constraint satisfaction later. We evaluate the proposed controller on a 6-DOF UR5e manipulation benchmark in MuJoCo, comparing it against DIAL-MPC (an iterative MPPI variant) as the arm traverses a cluttered tabletop environment.

NINov 12, 2021
RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN

Peizheng Li, Jonathan Thomas, Xiaoyang Wang et al.

Radio access network (RAN) technologies continue to evolve, with Open RAN gaining the most recent momentum. In the O-RAN specifications, the RAN intelligent controllers (RICs) are software-defined orchestration and automation functions for the intelligent management of RAN. This article introduces principles for machine learning (ML), in particular, reinforcement learning (RL) applications in the O-RAN stack. Furthermore, we review the state-of-the-art research in wireless networks and cast it onto the RAN framework and the hierarchy of the O-RAN architecture. We provide a taxonomy for the challenges faced by ML/RL models throughout the development life-cycle: from the system specification to production deployment (data acquisition, model design, testing and management, etc.). To address the challenges, we integrate a set of existing MLOps principles with unique characteristics when RL agents are considered. This paper discusses a systematic model development, testing and validation life-cycle, termed: RLOps. We discuss fundamental parts of RLOps, which include: model specification, development, production environment serving, operations monitoring and safety/security. Based on these principles, we propose the best practices for RLOps to achieve an automated and reproducible model development process. At last, a holistic data analytics platform rooted in the O-RAN deployment is designed and implemented, aiming to embrace and fulfil the aforementioned principles and best practices of RLOps.