LGFeb 5, 2024Code
Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short DelaysQingyuan Wu, Simon Sinong Zhan, Yixuan Wang et al.
Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL.
LGMay 1, 2025Code
Directly Forecasting Belief for Reinforcement Learning with DelaysQingyuan Wu, Yuhui Wang, Simon Sinong Zhan et al.
Reinforcement learning (RL) with delays is challenging as sensory perceptions lag behind the actual events: the RL agent needs to estimate the real state of its environment based on past observations. State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. This can cause the accumulation of compounding errors. To tackle this problem, our novel belief estimation method, named Directly Forecasting Belief Transformer (DFBT), directly forecasts states from observations without incrementally estimating intermediate states step-by-step. We theoretically demonstrate that DFBT greatly reduces compounding errors of existing recursively forecasting methods, yielding stronger performance guarantees. In experiments with D4RL offline datasets, DFBT reduces compounding errors with remarkable prediction accuracy. DFBT's capability to forecast state sequences also facilitates multi-step bootstrapping, thus greatly improving learning efficiency. On the MuJoCo benchmark, our DFBT-based method substantially outperforms SOTA baselines. Code is available at https://github.com/QingyuanWuNothing/DFBT.
5.3GTApr 18
Controlling Traffic without Tolls: A Non-Monetary Framework for Autonomous IntersectionsArda Kosay, Yusuf Saltan, Jyun-Jhe Wang et al.
The increasing complexity of urban transportation systems, driven by connected and automated vehicles, calls for new modeling paradigms and scalable control strategies. We propose a non-monetary control framework that leverages autonomous intersection management to influence routing decisions without tolls. The approach uses timestamp-based scheduling adjustments at roadside units (RSUs) to introduce path-dependent delays or advancements, steering traffic toward socially efficient flows. We develop a hierarchical architecture that separates real-time intersection control from network-level coordination. The resulting model admits a congestion-game formulation with path-dependent node costs. We establish the existence and essential uniqueness of equilibrium flows, eliminating ambiguities due to multiple equilibria and enabling a scalable and tractable bilevel optimization formulation for system-level incentive design. Experiments on the Sioux Falls network show that the proposed approach reduces the efficiency gap between user equilibrium and system-optimal flows by up to 71% under realistic constraints. These results demonstrate the potential of non-monetary, infrastructure-light control for next-generation intelligent transportation and urban mobility systems.
LGMay 23, 2024
Variational Delayed Policy OptimizationQingyuan Wu, Simon Sinong Zhan, Yixuan Wang et al.
In environments with delayed observation, state augmentation by including actions within the delay window is adopted to retrieve Markovian property to enable reinforcement learning (RL). However, state-of-the-art (SOTA) RL techniques with Temporal-Difference (TD) learning frameworks often suffer from learning inefficiency, due to the significant expansion of the augmented state space with the delay. To improve learning efficiency without sacrificing performance, this work introduces a novel framework called Variational Delayed Policy Optimization (VDPO), which reformulates delayed RL as a variational inference problem. This problem is further modelled as a two-step iterative optimization problem, where the first step is TD learning in the delay-free environment with a small state space, and the second step is behaviour cloning which can be addressed much more efficiently than TD learning. We not only provide a theoretical analysis of VDPO in terms of sample complexity and performance, but also empirically demonstrate that VDPO can achieve consistent performance with SOTA methods, with a significant enhancement of sample efficiency (approximately 50\% less amount of samples) in the MuJoCo benchmark.
LGJan 30, 2019
Reliable Smart Road SignsMuhammed O. Sayin, Chung-Wei Lin, Eunsuk Kang et al.
In this paper, we propose a game theoretical adversarial intervention detection mechanism for reliable smart road signs. A future trend in intelligent transportation systems is ``smart road signs" that incorporate smart codes (e.g., visible at infrared) on their surface to provide more detailed information to smart vehicles. Such smart codes make road sign classification problem aligned with communication settings more than conventional classification. This enables us to integrate well-established results in communication theory, e.g., error-correction methods, into road sign classification problem. Recently, vision-based road sign classification algorithms have been shown to be vulnerable against (even) small scale adversarial interventions that are imperceptible for humans. On the other hand, smart codes constructed via error-correction methods can lead to robustness against small scale intelligent or random perturbations on them. In the recognition of smart road signs, however, humans are out of the loop since they cannot see or interpret them. Therefore, there is no equivalent concept of imperceptible perturbations in order to achieve a comparable performance with humans. Robustness against small scale perturbations would not be sufficient since the attacker can attack more aggressively without such a constraint. Under a game theoretical solution concept, we seek to ensure certain measure of guarantees against even the worst case (intelligent) attackers that can perturb the signal even at large scale. We provide a randomized detection strategy based on the distance between the decoder output and the received input, i.e., error rate. Finally, we examine the performance of the proposed scheme over various scenarios.
AIFeb 22, 2018
Reliable Intersection Control in Non-cooperative EnvironmentsMuhammed O. Sayin, Chung-Wei Lin, Shinichi Shiraishi et al.
We propose a reliable intersection control mechanism for strategic autonomous and connected vehicles (agents) in non-cooperative environments. Each agent has access to his/her earliest possible and desired passing times, and reports a passing time to the intersection manager, who allocates the intersection temporally to the agents in a First-Come-First-Serve basis. However, the agents might have conflicting interests and can take actions strategically. To this end, we analyze the strategic behaviors of the agents and formulate Nash equilibria for all possible scenarios. Furthermore, among all Nash equilibria we identify a socially optimal equilibrium that leads to a fair intersection allocation, and correspondingly we describe a strategy-proof intersection mechanism, which achieves reliable intersection control such that the strategic agents do not have any incentive to misreport their passing times strategically.