LGSep 1, 2024
Generalized Multi-hop Traffic Pressure for Heterogeneous Traffic Perimeter ControlXiaocan Li, Xiaoyu Wang, Ilia Smirnov et al.
Perimeter control (PC) prevents loss of traffic network capacity due to congestion in urban areas. Homogeneous PC allows all access points to a protected region to have identical permitted inflow. However, homogeneous PC performs poorly when the congestion in the protected region is heterogeneous (e.g., imbalanced demand) since the homogeneous PC does not consider specific traffic conditions around each perimeter intersection. When the protected region has spatially heterogeneous congestion, one needs to modulate the perimeter inflow rate to be higher near low-density regions and vice versa for high-density regions. A naïve approach is to leverage 1-hop traffic pressure to measure traffic condition around perimeter intersections, but such metric is too spatially myopic for PC. To address this issue, we formulate multi-hop downstream pressure grounded on Markov chain theory, which ``looks deeper'' into the protected region beyond perimeter intersections. In addition, we formulate a two-stage hierarchical control scheme that can leverage this novel multi-hop pressure to redistribute the total permitted inflow provided by a pre-trained deep reinforcement learning homogeneous control policy. Experimental results show that our heterogeneous PC approaches leveraging multi-hop pressure significantly outperform homogeneous PC in scenarios where the origin-destination flows are highly imbalanced with high spatial heterogeneity. Moveover, our approach is shown to be robust against turning ratio uncertainties by a sensitivity analysis.
LGMay 19
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floorXiaocan Li, Shiliang Wu, Zheng Shen
MXFP4 arithmetic can dramatically accelerate reinforcement learning (RL) post-training of large language models (LLMs), yet the quantization error introduces severe accuracy degradation. Existing work treats the quantization error as a monolithic noise term, missing the distinct mechanisms upon interpreting how quantization error damages training. We prove an exact three-way decomposition of quantization error and show how each component dominates a distinct RL training pathway. Our theoretical and empirical analysis decomposes the MXFP4 quantization error into three additive components: "scale bias" from power-of-two rounding, "deadzone truncation" from zeroing small values, and "grid noise" from rounding to the nearest 4-bit grid. Each component dominates a distinct RL failure mode: scale bias accumulates multiplicatively through the backward pass, affecting gradient accuracy; deadzone truncation degrades rollout quality; and grid noise raises the policy's entropy. We combine corrections that are RL failure mode-targeted but not component-exclusive: Macro-block scaling to reduce scale bias, Outlier Fallback recovers deadzone entries, but also partially reduces scale bias induced error, and Adaptive Quantization Noise (AQN) for controlling the policy entropy. On Qwen2.5-3B dense and Qwen3-30B-A3B-Base mixture-of-experts model, the targeted corrections recover BF16 accuracy to within 0.7% and 3.0% respectively.
LGNov 10, 2024
Multi-hop Upstream Anticipatory Traffic Signal Control with Deep Reinforcement LearningXiaocan Li, Xiaoyu Wang, Ilia Smirnov et al.
Coordination in traffic signal control is crucial for managing congestion in urban networks. Existing pressure-based control methods focus only on immediate upstream links, leading to suboptimal green time allocation and increased network delays. However, effective signal control inherently requires coordination across a broader spatial scope, as the effect of upstream traffic should influence signal control decisions at downstream intersections, impacting a large area in the traffic network. Although agent communication using neural network-based feature extraction can implicitly enhance spatial awareness, it significantly increases the learning complexity, adding an additional layer of difficulty to the challenging task of control in deep reinforcement learning. To address the issue of learning complexity and myopic traffic pressure definition, our work introduces a novel concept based on Markov chain theory, namely \textit{multi-hop upstream pressure}, which generalizes the conventional pressure to account for traffic conditions beyond the immediate upstream links. This farsighted and compact metric informs the deep reinforcement learning agent to preemptively clear the multi-hop upstream queues, guiding the agent to optimize signal timings with a broader spatial awareness. Simulations on synthetic and realistic (Toronto) scenarios demonstrate controllers utilizing multi-hop upstream pressure significantly reduce overall network delay by prioritizing traffic movements based on a broader understanding of upstream congestion.
LGMay 30, 2023
Revisiting Random Forests in a Comparative Evaluation of Graph Convolutional Neural Network Variants for Traffic PredictionTa Jiun Ting, Xiaocan Li, Scott Sanner et al.
Traffic prediction is a spatiotemporal predictive task that plays an essential role in intelligent transportation systems. Today, graph convolutional neural networks (GCNNs) have become the prevailing models in the traffic prediction literature since they excel at extracting spatial correlations. In this work, we classify the components of successful GCNN prediction models and analyze the effects of matrix factorization, attention mechanism, and weight sharing on their performance. Furthermore, we compare these variations against random forests, a traditional regression method that predates GCNNs by over 15 years. We evaluated these methods using simulated data of two regions in Toronto as well as real-world sensor data from selected California highways. We found that incorporating matrix factorization, attention, and location-specific model weights either individually or collectively into GCNNs can result in a better overall performance. Moreover, although random forest regression is a less compact model, it matches or exceeds the performance of all variations of GCNNs in our experiments. This suggests that the current graph convolutional methods may not be the best approach to traffic prediction and there is still room for improvement. Finally, our findings also suggest that for future research on GCNN for traffic prediction to be credible, researchers must include performance comparison to random forests.
LGMay 29, 2023
Perimeter Control Using Deep Reinforcement Learning: A Model-free Approach towards Homogeneous Flow Rate OptimizationXiaocan Li, Ray Coden Mercurius, Ayal Taitler et al.
Perimeter control maintains high traffic efficiency within protected regions by controlling transfer flows among regions to ensure that their traffic densities are below critical values. Existing approaches can be categorized as either model-based or model-free, depending on whether they rely on network transmission models (NTMs) and macroscopic fundamental diagrams (MFDs). Although model-based approaches are more data efficient and have performance guarantees, they are inherently prone to model bias and inaccuracy. For example, NTMs often become imprecise for a large number of protected regions, and MFDs can exhibit scatter and hysteresis that are not captured in existing model-based works. Moreover, no existing studies have employed reinforcement learning for homogeneous flow rate optimization in microscopic simulation, where spatial characteristics, vehicle-level information, and metering realizations -- often overlooked in macroscopic simulations -- are taken into account. To circumvent issues of model-based approaches and macroscopic simulation, we propose a model-free deep reinforcement learning approach that optimizes the flow rate homogeneously at the perimeter at the microscopic level. Results demonstrate that our model-free reinforcement learning approach without any knowledge of NTMs or MFDs can compete and match the performance of a model-based approach, and exhibits enhanced generalizability and scalability.
NAJun 28, 2019
Tutorial: Complexity analysis of Singular Value Decomposition and its variantsXiaocan Li, Shuo Wang, Yinghao Cai
We compared the regular Singular Value Decomposition (SVD), truncated SVD, Krylov method and Randomized PCA, in terms of time and space complexity. It is well-known that Krylov method and Randomized PCA only performs well when k << n, i.e. the number of eigenpair needed is far less than that of matrix size. We compared them for calculating all the eigenpairs. We also discussed the relationship between Principal Component Analysis and SVD.