LGJul 21, 2024
Temporal Abstraction in Reinforcement Learning with Offline DataRanga Shaarad Ayyagari, Anurita Ghosh, Ambedkar Dukkipati
Standard reinforcement learning algorithms with a single policy perform poorly on tasks in complex environments involving sparse rewards, diverse behaviors, or long-term planning. This led to the study of algorithms that incorporate temporal abstraction by training a hierarchy of policies that plan over different time scales. The options framework has been introduced to implement such temporal abstraction by learning low-level options that act as extended actions controlled by a high-level policy. The main challenge in applying these algorithms to real-world problems is that they suffer from high sample complexity to train multiple levels of the hierarchy, which is impossible in online settings. Motivated by this, in this paper, we propose an offline hierarchical RL method that can learn options from existing offline datasets collected by other unknown agents. This is a very challenging problem due to the distribution mismatch between the learned options and the policies responsible for the offline dataset and to our knowledge, this is the first work in this direction. In this work, we propose a framework by which an online hierarchical reinforcement learning algorithm can be trained on an offline dataset of transitions collected by an unknown behavior policy. We validate our method on Gym MuJoCo locomotion environments and robotic gripper block-stacking tasks in the standard as well as transfer and goal-conditioned settings.
STApr 14, 2025
Predictive AI with External Knowledge Infusion for StocksAmbedkar Dukkipati, Kawin Mayilvaghanan, Naveen Kumar Pallekonda et al.
Fluctuations in stock prices are influenced by a complex interplay of factors that go beyond mere historical data. These factors, themselves influenced by external forces, encompass inter-stock dynamics, broader economic factors, various government policy decisions, outbreaks of wars, etc. Furthermore, all of these factors are dynamic and exhibit changes over time. In this paper, for the first time, we tackle the forecasting problem under external influence by proposing learning mechanisms that not only learn from historical trends but also incorporate external knowledge from temporal knowledge graphs. Since there are no such datasets or temporal knowledge graphs available, we study this problem with stock market data, and we construct comprehensive temporal knowledge graph datasets. In our proposed approach, we model relations on external temporal knowledge graphs as events of a Hawkes process on graphs. With extensive experiments, we show that learned dynamic representations effectively rank stocks based on returns across multiple holding periods, outperforming related baselines on relevant metrics.
LGDec 17, 2024
Active Reinforcement Learning Strategies for Offline Policy ImprovementAmbedkar Dukkipati, Ranga Shaarad Ayyagari, Bodhisattwa Dasgupta et al.
Learning agents that excel at sequential decision-making tasks must continuously resolve the problem of exploration and exploitation for optimal learning. However, such interactions with the environment online might be prohibitively expensive and may involve some constraints, such as a limited budget for agent-environment interactions and restricted exploration in certain regions of the state space. Examples include selecting candidates for medical trials and training agents in complex navigation environments. This problem necessitates the study of active reinforcement learning strategies that collect minimal additional experience trajectories by reusing existing offline data previously collected by some unknown behavior policy. In this work, we propose an active reinforcement learning method capable of collecting trajectories that can augment existing offline data. With extensive experimentation, we demonstrate that our proposed method reduces additional online interaction with the environment by up to 75% over competitive baselines across various continuous control environments such as Gym-MuJoCo locomotion environments as well as Maze2d, AntMaze, CARLA and IsaacSimGo1. To the best of our knowledge, this is the first work that addresses the active learning problem in the context of sequential decision-making and reinforcement learning.
LGMay 25, 2023
Markov Decision Processes under External Temporal ProcessesRanga Shaarad Ayyagari, Revanth Raj Eega, Ambedkar Dukkipati
Reinforcement Learning Algorithms are predominantly developed for stationary environments, and the limited literature that considers nonstationary environments often involves specific assumptions about changes that can occur in transition probability matrices and reward functions. Considering that real-world applications involve environments that continuously evolve due to various external events, and humans make decisions by discerning patterns in historical events, this study investigates Markov Decision Processes under the influence of an external temporal process. We establish the conditions under which the problem becomes tractable, allowing it to be addressed by considering only a finite history of events, based on the properties of the perturbations introduced by the exogenous process. We propose and theoretically analyze a policy iteration algorithm to tackle this problem, which learns policies contingent upon the current state of the environment, as well as a finite history of prior events of the exogenous process. We show that such an algorithm is not guaranteed to converge. However, we provide a guarantee for policy improvement in regions of the state space determined by the approximation error induced by considering tractable policies and value functions. We also establish the sample complexity of least-squares policy evaluation and policy improvement algorithms that consider approximations due to the incorporation of only a finite history of temporal events. While our results are applicable to general discrete-time processes satisfying certain conditions on the rate of decay of the influence of their events, we further analyze the case of discrete-time Hawkes processes with Gaussian marks. We performed experiments to demonstrate our findings for policy evaluation and deployment in traditional control environments.
AIJan 30, 2021
Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning AlgorithmAmbedkar Dukkipati, Rajarshi Banerjee, Ranga Shaarad Ayyagari et al.
Solving complex problems using reinforcement learning necessitates breaking down the problem into manageable tasks and learning policies to solve these tasks. These policies, in turn, have to be controlled by a master policy that takes high-level decisions. Hence learning policies involves hierarchical decision structures. However, training such methods in practice may lead to poor generalization, with either sub-policies executing actions for too few time steps or devolving into a single policy altogether. In our work, we introduce an alternative approach to learn such skills sequentially without using an overarching hierarchical policy. We propose this method in the context of environments where a major component of the objective of a learning agent is to prolong the episode for as long as possible. We refer to our proposed method as Sequential Soft Option Critic. We demonstrate the utility of our approach on navigation and goal-based tasks in a flexible simulated 3D navigation environment that we have developed. We also show that our method outperforms prior methods such as Soft Actor-Critic and Soft Option Critic on various environments, including the Atari River Raid environment and the Gym-Duckietown self-driving car simulator.