Xibo Li

ROJul 1, 2024

Let Hybrid A* Path Planner Obey Traffic Rules: A Deep Reinforcement Learning-Based Planning Framework

Xibo Li, Shruti Patel, Christof Büskens

Deep reinforcement learning (DRL) allows a system to interact with its environment and take actions by training an efficient policy that maximizes self-defined rewards. In autonomous driving, it can be used as a strategy for high-level decision making, whereas low-level algorithms such as the hybrid A* path planning have proven their ability to solve the local trajectory planning problem. In this work, we combine these two methods where the DRL makes high-level decisions such as lane change commands. After obtaining the lane change command, the hybrid A* planner is able to generate a collision-free trajectory to be executed by a model predictive controller (MPC). In addition, the DRL algorithm is able to keep the lane change command consistent within a chosen time-period. Traffic rules are implemented using linear temporal logic (LTL), which is then utilized as a reward function in DRL. Furthermore, we validate the proposed method on a real system to demonstrate its feasibility from simulation to implementation on real hardware.

45.6IRApr 17

Beyond One-Size-Fits-All: Adaptive Test-Time Augmentation for Sequential Recommendation

Xibo Li, Liang Zhang

Test-time augmentation (TTA) has become a promising approach for mitigating data sparsity in sequential recommendation by improving inference accuracy without requiring costly model retraining. However, existing TTA methods typically rely on uniform, user-agnostic augmentation strategies. We show that this "one-size-fits-all" design is inherently suboptimal, as it neglects substantial behavioral heterogeneity across users, and empirically demonstrate that the optimal augmentation operators vary significantly across user sequences with different characteristics for the first time. To address this limitation, we propose AdaTTA, a plug-and-play reinforcement learning-based adaptive inference framework that learns to select sequence-specific augmentation operators on a per-sequence basis. We formulate augmentation selection as a Markov Decision Process and introduce an Actor-Critic policy network with hybrid state representations and a joint macro-rank reward design to dynamically determine the optimal operator for each input user sequence. Extensive experiments on four real-world datasets and two recommendation backbones demonstrate that AdaTTA consistently outperforms the best fixed-strategy baselines, achieving up to 26.31% relative improvement on the Home dataset while incurring only moderate computational overhead

Xibo Li

2 Papers