Akhil S Anand

AI
h-index16
5papers
36citations
Novelty57%
AI Score40

5 Papers

SYJan 4, 2023
Learning-based MPC from Big Data Using Reinforcement Learning

Shambhuraj Sawant, Akhil S Anand, Dirk Reinhardt et al.

This paper presents an approach for learning Model Predictive Control (MPC) schemes directly from data using Reinforcement Learning (RL) methods. The state-of-the-art learning methods use RL to improve the performance of parameterized MPC schemes. However, these learning algorithms are often gradient-based methods that require frequent evaluations of computationally expensive MPC schemes, thereby restricting their use on big datasets. We propose to tackle this issue by using tools from RL to learn a parameterized MPC scheme directly from data in an offline fashion. Our approach derives an MPC scheme without having to solve it over the collected dataset, thereby eliminating the computational complexity of existing techniques for big data. We evaluate the proposed method on three simulated experiments of varying complexity.

SYNov 27, 2024
Application of Soft Actor-Critic Algorithms in Optimizing Wastewater Treatment with Time Delays Integration

Esmaeel Mohammadi, Daniel Ortiz-Arroyo, Aviaja Anna Hansen et al.

Wastewater treatment plants face unique challenges for process control due to their complex dynamics, slow time constants, and stochastic delays in observations and actions. These characteristics make conventional control methods, such as Proportional-Integral-Derivative controllers, suboptimal for achieving efficient phosphorus removal, a critical component of wastewater treatment to ensure environmental sustainability. This study addresses these challenges using a novel deep reinforcement learning approach based on the Soft Actor-Critic algorithm, integrated with a custom simulator designed to model the delayed feedback inherent in wastewater treatment plants. The simulator incorporates Long Short-Term Memory networks for accurate multi-step state predictions, enabling realistic training scenarios. To account for the stochastic nature of delays, agents were trained under three delay scenarios: no delay, constant delay, and random delay. The results demonstrate that incorporating random delays into the reinforcement learning framework significantly improves phosphorus removal efficiency while reducing operational costs. Specifically, the delay-aware agent achieved 36% reduction in phosphorus emissions, 55% higher reward, 77% lower target deviation from the regulatory limit, and 9% lower total costs than traditional control methods in the simulated environment. These findings underscore the potential of reinforcement learning to overcome the limitations of conventional control strategies in wastewater treatment, providing an adaptive and cost-effective solution for phosphorus removal.

AIJan 10, 2025
All AI Models are Wrong, but Some are Optimal

Akhil S Anand, Shambhuraj Sawant, Dirk Reinhardt et al.

AI models that predict the future behavior of a system (a.k.a. predictive AI models) are central to intelligent decision-making. However, decision-making using predictive AI models often results in suboptimal performance. This is primarily because AI models are typically constructed to best fit the data, and hence to predict the most likely future rather than to enable high-performance decision-making. The hope that such prediction enables high-performance decisions is neither guaranteed in theory nor established in practice. In fact, there is increasing empirical evidence that predictive models must be tailored to decision-making objectives for performance. In this paper, we establish formal (necessary and sufficient) conditions that a predictive model (AI-based or not) must satisfy for a decision-making policy established using that model to be optimal. We then discuss their implications for building predictive AI models for sequential decision-making.

AIDec 11, 2025
CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound

Akhil S Anand, Elias Aarekol, Martin Mziray Dalseg et al.

Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of modeling MILPs that accurately represent stochastic real world problems leads to suboptimal performance in the real world. Recently, machine learning methods have been applied to build MILP models for decision quality rather than how accurately they model the real world problem. However, these approaches typically rely on supervised learning, assume access to true optimal decisions, and use surrogates for the MILP gradients. In this work, we introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance. We enable this by casting an MILP solved by B&B as a differentiable stochastic policy compatible with RL. We validate the CORL method in a simple illustrative combinatorial sequential decision making example.

LGOct 20, 2025
Closing the Sim2Real Performance Gap in RL

Akhil S Anand, Shambhuraj Sawant, Jasper Hoffmann et al.

Sim2Real aims at training policies in high-fidelity simulation environments and effectively transferring them to the real world. Despite the developments of accurate simulators and Sim2Real RL approaches, the policies trained purely in simulation often suffer significant performance drops when deployed in real environments. This drop is referred to as the Sim2Real performance gap. Current Sim2Real RL methods optimize the simulator accuracy and variability as proxies for real-world performance. However, these metrics do not necessarily correlate with the real-world performance of the policy as established theoretically and empirically in the literature. We propose a novel framework to address this issue by directly adapting the simulator parameters based on real-world performance. We frame this problem as a bi-level RL framework: the inner-level RL trains a policy purely in simulation, and the outer-level RL adapts the simulation model and in-sim reward parameters to maximize real-world performance of the in-sim policy. We derive and validate in simple examples the mathematical tools needed to develop bi-level RL algorithms that close the Sim2Real performance gap.