LGMar 13, 2024
Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity AnalysisRui Liu, Anish Gupta, Erfaun Noorani et al.
Reinforcement Learning (RL) has shown exceptional performance across various applications, enabling autonomous agents to learn optimal policies through interaction with their environments. However, traditional RL frameworks often face challenges in terms of iteration efficiency and safety. Risk-sensitive policy gradient methods, which incorporate both expected return and risk measures, have been explored for their ability to yield safe policies, yet their iteration complexity remains largely underexplored. In this work, we conduct a rigorous iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm with an exponential utility function. We establish an iteration complexity of $\mathcal{O}(ε^{-2})$ to reach an $ε$-approximate first-order stationary point (FOSP). Furthermore, we investigate whether risk-sensitive algorithms can achieve better iteration complexity compared to their risk-neutral counterparts. Our analysis indicates that risk-sensitive REINFORCE can potentially converge faster. To validate our analysis, we empirically evaluate the learning performance and convergence efficiency of the risk-neutral and risk-sensitive REINFORCE algorithms in multiple environments: CartPole, MiniGrid, and Robot Navigation. Empirical results confirm that risk-sensitive cases can converge and stabilize faster compared to their risk-neutral counterparts. More details can be found on our website https://anonymous.4open.science/w/riskrl.
LGJun 15, 2025
Forecasting Time Series with LLMs via Patch-Based Prompting and DecompositionMayank Bumb, Anshul Vemulapalli, Sri Harsha Vardhan Prasad Jella et al.
Recent advances in Large Language Models (LLMs) have demonstrated new possibilities for accurate and efficient time series analysis, but prior work often required heavy fine-tuning and/or ignored inter-series correlations. In this work, we explore simple and flexible prompt-based strategies that enable LLMs to perform time series forecasting without extensive retraining or the use of a complex external architecture. Through the exploration of specialized prompting methods that leverage time series decomposition, patch-based tokenization, and similarity-based neighbor augmentation, we find that it is possible to enhance LLM forecasting quality while maintaining simplicity and requiring minimal preprocessing of data. To this end, we propose our own method, PatchInstruct, which enables LLMs to make precise and effective predictions.
RODec 24, 2021
Non Holonomic Collision Avoidance of Dynamic Obstacles under Non-Parametric Uncertainty: A Hilbert Space ApproachUnni Krishnan R Nair, Anish Gupta, D. A. Sasi Kiran et al.
We consider the problem of an agent/robot with non-holonomic kinematics avoiding many dynamic obstacles. State and velocity noise of both the robot and obstacles as well as the robot's control noise are modelled as non-parametric distributions as often the Gaussian assumptions of noise models are violated in real-world scenarios. Under these assumptions, we formulate a robust MPC that samples robotic controls effectively in a manner that aligns the robot to the goal state while avoiding obstacles under the duress of such non-parametric noise. In particular, the MPC incorporates a distribution matching cost that effectively aligns the distribution of the current collision cone to a certain desired distribution whose samples are collision-free. This cost is posed as a distance function in the Hilbert Space, whose minimization typically results in the collision cone samples becoming collision-free. We compare and show tangible performance gain with methods that model the collision cone distribution by linearizing the Gaussian approximations of the original non-parametric state and obstacle distributions. We also show superior performance with methods that pose a chance constraint formulation of the Gaussian approximations of non-parametric noise without subjecting such approximations to further linearizations. The performance gain is shown both in terms of trajectory length and control costs that vindicates the efficacy of the proposed method. To the best of our knowledge, this is the first presentation of non-holonomic collision avoidance of moving obstacles in the presence of non-parametric state, velocity and actuator noise models.
ROSep 21, 2021
Multi-Modal Model Predictive Control through Batch Non-Holonomic Trajectory Optimization: Application to Highway DrivingVivek K. Adajania, Aditya Sharma, Anish Gupta et al.
Standard Model Predictive Control (MPC) or trajectory optimization approaches perform only a local search to solve a complex non-convex optimization problem. As a result, they cannot capture the multi-modal characteristic of human driving. A global optimizer can be a potential solution but is computationally intractable in a real-time setting. In this paper, we present a real-time MPC capable of searching over different driving modalities. Our basic idea is simple: we run several goal-directed parallel trajectory optimizations and score the resulting trajectories based on user-defined meta cost functions. This allows us to perform a global search over several locally optimal motion plans. Although conceptually straightforward, realizing this idea in real-time with existing optimizers is highly challenging from technical and computational standpoints. With this motivation, we present a novel batch non-holonomic trajectory optimization whose underlying matrix algebra is easily parallelizable across problem instances and reduces to computing large batch matrix-vector products. This structure, in turn, is achieved by deriving a linearization-free multi-convex reformulation of the non-holonomic kinematics and collision avoidance constraints. We extensively validate our approach using both synthetic and real data sets (NGSIM) of traffic scenarios. We highlight how our algorithm automatically takes lane-change and overtaking decisions based on the defined meta cost function. Our batch optimizer achieves trajectories with lower meta cost, up to 6x faster than competing baselines.