Cathy Wu

LG
h-index17
52papers
1,334citations
Novelty48%
AI Score58

52 Papers

LGJun 29, 2023Code
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Federico Berto, Chuanbo Hua, Junyoung Park et al. · pku

Combinatorial optimization (CO) is fundamental to several real-world applications, from logistics and scheduling to hardware design and resource allocation. Deep reinforcement learning (RL) has recently shown significant benefits in solving CO problems, reducing reliance on domain expertise and improving computational efficiency. However, the absence of a unified benchmarking framework leads to inconsistent evaluations, limits reproducibility, and increases engineering overhead, raising barriers to adoption for new researchers. To address these challenges, we introduce RL4CO, a unified and extensive benchmark with in-depth library coverage of 27 CO problem environments and 23 state-of-the-art baselines. Built on efficient software libraries and best practices in implementation, RL4CO features modularized implementation and flexible configurations of diverse environments, policy architectures, RL algorithms, and utilities with extensive documentation. RL4CO helps researchers build on existing successes while exploring and developing their own designs, facilitating the entire research process by decoupling science from heavy engineering. We finally provide extensive benchmark studies to inspire new insights and future work. RL4CO has already attracted numerous researchers in the community and is open-sourced at https://github.com/ai4co/rl4co.

AIJul 30, 2022
Unified Automatic Control of Vehicular Systems with Reinforcement Learning

Zhongxia Yan, Abdul Rahman Kreidieh, Eugene Vinitsky et al.

Emerging vehicular systems with increasing proportions of automated components present opportunities for optimal control to mitigate congestion and increase efficiency. There has been a recent interest in applying deep reinforcement learning (DRL) to these nonlinear dynamical systems for the automatic design of effective control strategies. Despite conceptual advantages of DRL being model-free, studies typically nonetheless rely on training setups that are painstakingly specialized to specific vehicular systems. This is a key challenge to efficient analysis of diverse vehicular and mobility systems. To this end, this article contributes a streamlined methodology for vehicular microsimulation and discovers high performance control strategies with minimal manual design. A variable-agent, multi-task approach is presented for optimization of vehicular Partially Observed Markov Decision Processes. The methodology is experimentally validated on mixed autonomy traffic systems, where fractions of vehicles are automated; empirical improvement, typically 15-60% over a human driving baseline, is observed in all configurations of six diverse open or closed traffic systems. The study reveals numerous emergent behaviors resembling wave mitigation, traffic signaling, and ramp metering. Finally, the emergent behaviors are analyzed to produce interpretable control strategies, which are validated against the learned control strategies.

91.0AIMay 24Code
FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

Minwei Kong, Chonghe Jiang, Ao Qu et al.

Large language models (LLMs) are increasingly used for optimization modeling and solver-code generation, yet practical operations research and optimization problems often require a harder capability: designing scalable algorithms that exploit problem structure and outperform direct formulation-and-solve baselines. Existing benchmarks are limited to small or simplified examples far below real-world scale and complexity. We introduce FrontierOR, among the first benchmarks to systematically evaluate LLM-based efficient algorithm design for realistic large-scale optimization problems. FrontierOR includes 180 tasks derived from methodologically diverse papers published in top-tier operations research venues, each with standardized instances and a hidden, expert-verified evaluation suite. We evaluate seven LLMs spanning frontier, cost-effective, and open-source models both in one-shot and test-time evolution settings. The results reveal that frontier models still struggle to move from executable formulations to efficient optimization algorithms: the strongest one-shot model outperforms Gurobi in only 31% of cases in both solution quality and computational efficiency, and even strong coding agents with test-time evolution achieve only 50% on selected hard tasks. FrontierOR establishes a practical evaluation platform for LLM-based optimization algorithm design, which enables future LLMs and agents to be systematically tested on whether they can move beyond correct formulation toward a feasible, high-quality, and efficient algorithm. Our FrontierOR Benchmark is available at https://anonymous.4open.science/r/efficient-opt-bench-F03D.

99.9AIApr 2Code
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

Ao Qu, Han Zheng, Zijian Zhou et al.

Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first framework for autonomous multi-agent evolution on open-ended problems. CORAL replaces rigid control with long-running agents that explore, reflect, and collaborate through shared persistent memory, asynchronous multi-agent execution, and heartbeat-based interventions. It also provides practical safeguards, including isolated workspaces, evaluator separation, resource management, and agent session and health management. Evaluated on diverse mathematical, algorithmic, and systems optimization tasks, CORAL sets new state-of-the-art results on 10 tasks, achieving 3-10 times higher improvement rates with far fewer evaluations than fixed evolutionary search baselines across tasks. On Anthropic's kernel engineering task, four co-evolving agents improve the best known score from 1363 to 1103 cycles. Mechanistic analyses further show how these gains arise from knowledge reuse and multi-agent exploration and communication. Together, these results suggest that greater agent autonomy and multi-agent evolution can substantially improve open-ended discovery. Code is available at https://github.com/Human-Agent-Society/CORAL.

82.8SYMay 21Code
Open-Source METANET Calibration for Reproducible Freeway Traffic Macroscopic Simulation

Monica Chan, Shreyaa Raghavan, Cathy Wu

METANET is a widely used second-order macroscopic traffic flow model for freeway networks, supporting applications across traffic simulation, ramp metering, and variable speed limit control. The predictive accuracy of any traffic model, however, hinges on careful calibration to real-world conditions. Despite its widespread use, there have not been open-source tools for calibrating METANET's parameters. Without open-source calibration, results cannot be easily reproduced or extended to other networks. This work provides an open-source METANET calibration, simulation, and data visualization tool. The calibration is formulated as a nonlinear program (NLP) solved via the interior-point method (IPOPT), with joint ramp flow estimation. We validate our calibration on real-world freeway data from two widely used traffic monitoring systems: Interstate-24 MObility Technology Interstate Observation Network (I-24 MOTION), one of the largest open-road trajectory instruments in the country, and loop detector data from the Caltrans Performance Measurement System (PeMS), which spans nearly 40,000 detectors across California freeways and serves as a standard benchmark in traffic research. Models calibrated using our method are able to reproduce these datasets' observed traffic patterns across diverse network geometries and traffic conditions including complex stop-and-go congestion waves. As large-scale traffic monitoring infrastructure continues to expand, open-source calibration tools are essential for translating growing volumes of sensor data into validated models that can support real-world traffic control. The complete code is publicly available at https://github.com/woxsao/metanet-calibration to support reproducible research in freeway traffic modeling and control.

OCNov 8, 2023Code
Learning to Configure Separators in Branch-and-Cut

Sirui Li, Wenbin Ouyang, Max B. Paulus et al.

Cutting planes are crucial in solving mixed integer linear programs (MILP) as they facilitate bound improvements on the optimal solution. Modern MILP solvers rely on a variety of separators to generate a diverse set of cutting planes by invoking the separators frequently during the solving process. This work identifies that MILP solvers can be drastically accelerated by appropriately selecting separators to activate. As the combinatorial separator selection space imposes challenges for machine learning, we learn to separate by proposing a novel data-driven strategy to restrict the selection space and a learning-guided algorithm on the restricted space. Our method predicts instance-aware separator configurations which can dynamically adapt during the solve, effectively accelerating the open source MILP solver SCIP by improving the relative solve time up to 72% and 37% on synthetic and real-world MILP benchmarks. Our work complements recent work on learning to select cutting planes and highlights the importance of separator management.

LGAug 8, 2024
Model-Based Transfer Learning for Contextual Reinforcement Learning

Jung-Hoon Cho, Vindula Jayawardana, Sirui Li et al. · mit

Deep reinforcement learning (RL) is a powerful approach to complex decision making. However, one issue that limits its practical application is its brittleness, sometimes failing to train in the presence of small changes in the environment. Motivated by the success of zero-shot transfer-where pre-trained models perform well on related tasks-we consider the problem of selecting a good set of training tasks to maximize generalization performance across a range of tasks. Given the high cost of training, it is critical to select training tasks strategically, but not well understood how to do so. We hence introduce Model-Based Transfer Learning (MBTL), which layers on top of existing RL methods to effectively solve contextual RL problems. MBTL models the generalization performance in two parts: 1) the performance set point, modeled using Gaussian processes, and 2) performance loss (generalization gap), modeled as a linear function of contextual similarity. MBTL combines these two pieces of information within a Bayesian optimization (BO) framework to strategically select training tasks. We show theoretically that the method exhibits sublinear regret in the number of training tasks and discuss conditions to further tighten regret bounds. We experimentally validate our methods using urban traffic and standard continuous control benchmarks. The experimental results suggest that MBTL can achieve up to 43x improved sample efficiency compared with canonical independent training and multi-task training. Further experiments demonstrate the efficacy of BO and the insensitivity to the underlying RL algorithm and hyperparameters. This work lays the foundations for investigating explicit modeling of generalization, thereby enabling principled yet effective methods for contextual RL.

RONov 27, 2023
Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

Jung-Hoon Cho, Sirui Li, Jeongyun Kim et al. · mit

The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic to maximize vehicle speed and throughput. This paper explores advisory autonomy, in which real-time driving advisories are issued to the human drivers, thus achieving near-term performance of automated vehicles. Due to the complexity of traffic systems, recent studies of coordinating CAVs have resorted to leveraging deep reinforcement learning (RL). Coarse-grained advisory is formalized as zero-order holds, and we consider a range of hold duration from 0.1 to 40 seconds. However, despite the similarity of the higher frequency tasks on CAVs, a direct application of deep RL fails to be generalized to advisory autonomy tasks. To overcome this, we utilize zero-shot transfer, training policies on a set of source tasks--specific traffic scenarios with designated hold durations--and then evaluating the efficacy of these policies on different target tasks. We introduce Temporal Transfer Learning (TTL) algorithms to select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This paper underscores the potential of coarse-grained advisory autonomy with TTL in traffic flow optimization.

SYApr 26, 2022
Learning Eco-Driving Strategies at Signalized Intersections

Vindula Jayawardana, Cathy Wu

Signalized intersections in arterial roads result in persistent vehicle idling and excess accelerations, contributing to fuel consumption and CO2 emissions. There has thus been a line of work studying eco-driving control strategies to reduce fuel consumption and emission levels at intersections. However, methods to devise effective control strategies across a variety of traffic settings remain elusive. In this paper, we propose a reinforcement learning (RL) approach to learn effective eco-driving control strategies. We analyze the potential impact of a learned strategy on fuel consumption, CO2 emission, and travel time and compare with naturalistic driving and model-based baselines. We further demonstrate the generalizability of the learned policies under mixed traffic scenarios. Simulation results indicate that scenarios with 100% penetration of connected autonomous vehicles (CAV) may yield as high as 18% reduction in fuel consumption and 25% reduction in CO2 emission levels while even improving travel speed by 20%. Furthermore, results indicate that even 25% CAV penetration can bring at least 50% of the total fuel and emission reduction benefits.

LGOct 16, 2022
The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning

Vindula Jayawardana, Catherine Tang, Sirui Li et al.

Evaluations of Deep Reinforcement Learning (DRL) methods are an integral part of scientific progress of the field. Beyond designing DRL methods for general intelligence, designing task-specific methods is becoming increasingly prominent for real-world applications. In these settings, the standard evaluation practice involves using a few instances of Markov Decision Processes (MDPs) to represent the task. However, many tasks induce a large family of MDPs owing to variations in the underlying environment, particularly in real-world contexts. For example, in traffic signal control, variations may stem from intersection geometries and traffic flow levels. The select MDP instances may thus inadvertently cause overfitting, lacking the statistical power to draw conclusions about the method's true performance across the family. In this article, we augment DRL evaluations to consider parameterized families of MDPs. We show that in comparison to evaluating DRL methods on select MDP instances, evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art. We validate this phenomenon in standard control benchmarks and the real-world application of traffic signal control. At the same time, we show that accurately evaluating on an MDP family is nontrivial. Overall, this work identifies new challenges for empirical rigor in reinforcement learning, especially as the outcomes of DRL trickle into downstream decision-making.

LGAug 1, 2023
PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

Aamir Hasan, Neeloy Chakraborty, Haonan Chen et al.

Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to be followed by human drivers. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion. We first infer the driver's intrinsic traits on how they follow instructions in an unsupervised manner with a variational autoencoder. Then, a policy conditioned on the inferred trait adapts the action of the PC policy to provide the driver with a personalized recommendation. Our system is trained in simulation with novel driver modeling of instruction adherence. We show that our approach successfully mitigates congestion while adapting to different driver behaviors, with 4 to 22% improvement in average speed over baselines.

LGFeb 17, 2023
Towards Co-operative Congestion Mitigation

Aamir Hasan, Neeloy Chakraborty, Cathy Wu et al.

The effects of traffic congestion are widespread and are an impedance to everyday life. Piecewise constant driving policies have shown promise in helping mitigate traffic congestion in simulation environments. However, no works currently test these policies in situations involving real human users. Thus, we propose to evaluate these policies through the use of a shared control framework in a collaborative experiment with the human driver and the driving policy aiming to co-operatively mitigate congestion. We intend to use the CARLA simulator alongside the Flow framework to conduct user studies to evaluate the affect of piecewise constant driving policies. As such, we present our in-progress work in building our framework and discuss our proposed plan on evaluating this framework through a human-in-the-loop simulation user study.

LGAug 15, 2024
The Nah Bandit: Modeling User Non-compliance in Recommendation Systems

Tianyue Zhou, Jung-Hoon Cho, Cathy Wu · mit

Recommendation systems now pervade the digital world, ranging from advertising to entertainment. However, it remains challenging to implement effective recommendation systems in the physical world, such as in mobility or health. This work focuses on a key challenge: in the physical world, it is often easy for the user to opt out of taking any recommendation if they are not to her liking, and to fall back to her baseline behavior. It is thus crucial in cyber-physical recommendation systems to operate with an interaction model that is aware of such user behavior, lest the user abandon the recommendations altogether. This paper thus introduces the Nah Bandit, a tongue-in-cheek reference to describe a Bandit problem where users can say `nah' to the recommendation and opt for their preferred option instead. As such, this problem lies in between a typical bandit setup and supervised learning. We model the user non-compliance by parameterizing an anchoring effect of recommendations on users. We then propose the Expert with Clustering (EWC) algorithm, a hierarchical approach that incorporates feedback from both recommended and non-recommended options to accelerate user preference learning. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ clusters, EWC achieves a regret bound of $O(N\sqrt{T\log K} + NT)$, achieving superior theoretical performance in the short term compared to LinUCB algorithm. Experimental results also highlight that EWC outperforms both supervised learning and traditional contextual bandit approaches. This advancement reveals that effective use of non-compliance feedback can accelerate preference learning and improve recommendation accuracy. This work lays the foundation for future research in Nah Bandit, providing a robust framework for more effective recommendation systems.

29.6AIMar 25
Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation

Han Zheng, Yining Ma, Brandon Araki et al.

Lifelong Multi-Agent Path Finding (MAPF) is critical for modern warehouse automation, which requires multiple robots to continuously navigate conflict-free paths to optimize the overall system throughput. However, the complexity of warehouse environments and the long-term dynamics of lifelong MAPF often demand costly adaptations to classical search-based solvers. While machine learning methods have been explored, their superiority over search-based methods remains inconclusive. In this paper, we introduce Reinforcement Learning (RL) guided Rolling Horizon Prioritized Planning (RL-RH-PP), the first framework integrating RL with search-based planning for lifelong MAPF. Specifically, we leverage classical Prioritized Planning (PP) as a backbone for its simplicity and flexibility in integrating with a learning-based priority assignment policy. By formulating dynamic priority assignment as a Partially Observable Markov Decision Process (POMDP), RL-RH-PP exploits the sequential decision-making nature of lifelong planning while delegating complex spatial-temporal interactions among agents to reinforcement learning. An attention-based neural network autoregressively decodes priority orders on-the-fly, enabling efficient sequential single-agent planning by the PP planner. Evaluations in realistic warehouse simulations show that RL-RH-PP achieves the highest total throughput among baselines and generalizes effectively across agent densities, planning horizons, and warehouse layouts. Our interpretive analyses reveal that RL-RH-PP proactively prioritizes congested agents and strategically redirects agents from congestion, easing traffic flow and boosting throughput. These findings highlight the potential of learning-guided approaches to augment traditional heuristics in modern warehouse automation.

AIJan 8
A General Neural Backbone for Mixed-Integer Linear Optimization via Dual Attention

Peixin Huang, Yaoxin Wu, Yining Ma et al.

Mixed-integer linear programming (MILP), a widely used modeling framework for combinatorial optimization, are central to many scientific and engineering applications, yet remains computationally challenging at scale. Recent advances in deep learning address this challenge by representing MILP instances as variable-constraint bipartite graphs and applying graph neural networks (GNNs) to extract latent structural patterns and enhance solver efficiency. However, this architecture is inherently limited by the local-oriented mechanism, leading to restricted representation power and hindering neural approaches for MILP. Here we present an attention-driven neural architecture that learns expressive representations beyond the pure graph view. A dual-attention mechanism is designed to perform parallel self- and cross-attention over variables and constraints, enabling global information exchange and deeper representation learning. We apply this general backbone to various downstream tasks at the instance level, element level, and solving state level. Extensive experiments across widely used benchmarks show consistent improvements of our approach over state-of-the-art baselines, highlighting attention-based neural architectures as a powerful foundation for learning-enhanced mixed-integer linear optimization.

AIFeb 17
Towards Efficient Constraint Handling in Neural Solvers for Routing Problems

Jieyi Bi, Zhiguang Cao, Jianan Zhou et al.

Neural solvers have achieved impressive progress in addressing simple routing problems, particularly excelling in computational efficiency. However, their advantages under complex constraints remain nascent, for which current constraint-handling schemes via feasibility masking or implicit feasibility awareness can be inefficient or inapplicable for hard constraints. In this paper, we present Construct-and-Refine (CaR), the first general and efficient constraint-handling framework for neural routing solvers based on explicit learning-based feasibility refinement. Unlike prior construction-search hybrids that target reducing optimality gaps through heavy improvements yet still struggle with hard constraints, CaR achieves efficient constraint handling by designing a joint training framework that guides the construction module to generate diverse and high-quality solutions well-suited for a lightweight improvement process, e.g., 10 steps versus 5k steps in prior work. Moreover, CaR presents the first use of construction-improvement-shared representation, enabling potential knowledge sharing across paradigms by unifying the encoder, especially in more complex constrained scenarios. We evaluate CaR on typical hard routing constraints to showcase its broader applicability. Results demonstrate that CaR achieves superior feasibility, solution quality, and efficiency compared to both classical and neural state-of-the-art solvers.

SYAug 10, 2024
Mitigating Metropolitan Carbon Emissions with Dynamic Eco-driving at Scale

Vindula Jayawardana, Baptiste Freydt, Ao Qu et al.

The sheer scale and diversity of transportation make it a formidable sector to decarbonize. Here, we consider an emerging opportunity to reduce carbon emissions: the growing adoption of semi-autonomous vehicles, which can be programmed to mitigate stop-and-go traffic through intelligent speed commands and, thus, reduce emissions. But would such dynamic eco-driving move the needle on climate change? A comprehensive impact analysis has been out of reach due to the vast array of traffic scenarios and the complexity of vehicle emissions. We address this challenge with large-scale scenario modeling efforts and by using multi-task deep reinforcement learning with a carefully designed network decomposition strategy. We perform an in-depth prospective impact assessment of dynamic eco-driving at 6,011 signalized intersections across three major US metropolitan cities, simulating a million traffic scenarios. Overall, we find that vehicle trajectories optimized for emissions can cut city-wide intersection carbon emissions by 11-22%, without harming throughput or safety, and with reasonable assumptions, equivalent to the national emissions of Israel and Nigeria, respectively. We find that 10% eco-driving adoption yields 25%-50% of the total reduction, and nearly 70% of the benefits come from 20% of intersections, suggesting near-term implementation pathways. However, the composition of this high-impact subset of intersections varies considerably across different adoption levels, with minimal overlap, calling for careful strategic planning for eco-driving deployments. Moreover, the impact of eco-driving, when considered jointly with projections of vehicle electrification and hybrid vehicle adoption remains significant. More broadly, this work paves the way for large-scale analysis of traffic externalities, such as time, safety, and air quality, and the potential impact of solution strategies.

31.0SYMay 18
Dynamic Gradient-Based Calibration for Robust and Accurate Traffic Macrosimulation

Shreyaa Raghavan, Cameron Hickert, Monica Chan et al.

Robust and accurate calibration of macroscopic traffic flow models such as METANET is critical for reliable prediction and effective control. While gradient-based methods are desirable for high-dimensional parameter spaces, their application to real-world traffic scenarios is hindered by highly nonconvex optimization landscapes. Consequently, standard static calibration frequently yields parameter sets that produce unstable, unrealistic traffic dynamics, undermining confidence in the estimated parameters and compromising the simulation's utility for counterfactual scenario testing. To address this, we propose a dynamic, rolling-horizon calibration framework. By reformulating static one-time estimation as a closed-loop control problem, parameters better maintain stability and accuracy in the presence of measurement noise. Using real-world data from the I-24 MOTION testbed, this work empirically characterizes the instability of standard methods. It then shows that the proposed approach simultaneously enhances robustness to perturbations and achieves a 48% improvement in predictive accuracy over conventional static calibration.

AISep 21, 2024
A Survey on Large Language Model-empowered Autonomous Driving

Yuxuan Zhu, Shiyi Wang, Wenqing Zhong et al.

Artificial intelligence (AI) plays a crucial role in autonomous driving (AD) research, propelling its development towards intelligence and efficiency. Currently, the development of AD technology follows two main technical paths: modularization and end-to-end. Modularization decompose the driving task into modules such as perception, prediction, planning, and control, and train them separately. Due to the inconsistency of training objectives between modules, the integrated effect suffers from bias. End-to-end attempts to address this issue by utilizing a single model that directly maps from sensor data to control signals. This path has limited learning capabilities in a comprehensive set of features and struggles to handle unpredictable long-tail events and complex urban traffic scenarios. In the face of challenges encountered in both paths, many researchers believe that large language models (LLMs) with powerful reasoning capabilities and extensive knowledge understanding may be the solution, expecting LLMs to provide AD systems with deeper levels of understanding and decision-making capabilities. In light of the challenges faced by both paths, many researchers believe that LLMs, with their powerful reasoning abilities and extensive knowledge, could offer a solution. To understand if LLMs could enhance AD, this paper conducts a thorough analysis of the potential applications of LLMs in AD systems, including exploring their optimization strategies in both modular and end-to-end approaches, with a particular focus on how LLMs can tackle the problems and challenges present in current solutions. Furthermore, we discuss an important question: Can LLM-based artificial general intelligence (AGI) be a key to achieve high-level AD? We further analyze the potential limitations and challenges that LLMs may encounter in promoting the development of AD technology.

LGOct 19, 2024Code
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning

Vindula Jayawardana, Baptiste Freydt, Ao Qu et al.

Despite the popularity of multi-agent reinforcement learning (RL) in simulated and two-player applications, its success in messy real-world applications has been limited. A key challenge lies in its generalizability across problem variations, a common necessity for many real-world problems. Contextual reinforcement learning (CRL) formalizes learning policies that generalize across problem variations. However, the lack of standardized benchmarks for multi-agent CRL has hindered progress in the field. Such benchmarks are desired to be based on real-world applications to naturally capture the many open challenges of real-world problems that affect generalization. To bridge this gap, we propose IntersectionZoo, a comprehensive benchmark suite for multi-agent CRL through the real-world application of cooperative eco-driving in urban road networks. The task of cooperative eco-driving is to control a fleet of vehicles to reduce fleet-level vehicular emissions. By grounding IntersectionZoo in a real-world application, we naturally capture real-world problem characteristics, such as partial observability and multiple competing objectives. IntersectionZoo is built on data-informed simulations of 16,334 signalized intersections derived from 10 major US cities, modeled in an open-source industry-grade microscopic traffic simulator. By modeling factors affecting vehicular exhaust emissions (e.g., temperature, road conditions, travel demand), IntersectionZoo provides one million data-driven traffic scenarios. Using these traffic scenarios, we benchmark popular multi-agent RL and human-like driving algorithms and demonstrate that the popular multi-agent RL algorithms struggle to generalize in CRL settings.

ROSep 5, 2024
Multi-agent Path Finding for Mixed Autonomy Traffic Coordination

Han Zheng, Zhongxia Yan, Cathy Wu

In the evolving landscape of urban mobility, the prospective integration of Connected and Automated Vehicles (CAVs) with Human-Driven Vehicles (HDVs) presents a complex array of challenges and opportunities for autonomous driving systems. While recent advancements in robotics have yielded Multi-Agent Path Finding (MAPF) algorithms tailored for agent coordination task characterized by simplified kinematics and complete control over agent behaviors, these solutions are inapplicable in mixed-traffic environments where uncontrollable HDVs must coexist and interact with CAVs. Addressing this gap, we propose the Behavior Prediction Kinematic Priority Based Search (BK-PBS), which leverages an offline-trained conditional prediction model to forecast HDV responses to CAV maneuvers, integrating these insights into a Priority Based Search (PBS) where the A* search proceeds over motion primitives to accommodate kinematic constraints. We compare BK-PBS with CAV planning algorithms derived by rule-based car-following models, and reinforcement learning. Through comprehensive simulation on a highway merging scenario across diverse scenarios of CAV penetration rate and traffic density, BK-PBS outperforms these baselines in reducing collision rates and enhancing system-level travel delay. Our work is directly applicable to many scenarios of multi-human multi-robot coordination.

LGFeb 6, 2025Code
NeuralMOVES: A lightweight and microscopic vehicle emission estimation model based on reverse engineering and surrogate learning

Edgar Ramirez-Sanchez, Catherine Tang, Yaosheng Xu et al.

The transportation sector significantly contributes to greenhouse gas emissions, necessitating accurate emission models to guide mitigation strategies. Despite its field validation and certification, the industry-standard Motor Vehicle Emission Simulator (MOVES) faces challenges related to complexity in usage, high computational demands, and its unsuitability for microscopic real-time applications. To address these limitations, we present NeuralMOVES, a comprehensive suite of high-performance, lightweight surrogate models for vehicle CO2 emissions. Developed based on reverse engineering and Neural Networks, NeuralMOVES achieves a remarkable 6.013% Mean Average Percentage Error relative to MOVES across extensive tests spanning over two million scenarios with diverse trajectories and the factors regarding environments and vehicles. NeuralMOVES is only 2.4 MB, largely condensing the original MOVES and the reverse engineered MOVES into a compact representation, while maintaining high accuracy. Therefore, NeuralMOVES significantly enhances accessibility while maintaining the accuracy of MOVES, simplifying CO2 evaluation for transportation analyses and enabling real-time, microscopic applications across diverse scenarios without reliance on complex software or extensive computational resources. Moreover, this paper provides, for the first time, a framework for reverse engineering industrial-grade software tailored specifically to transportation scenarios, going beyond MOVES. The surrogate models are available at https://github.com/edgar-rs/neuralMOVES.

37.4AIMay 12
Rethinking Positional Encoding for Neural Vehicle Routing

Chuanbo Hua, Federico Berto, Andre Hottung et al.

Transformer-based models have become the dominant paradigm for neural combinatorial optimization (NCO) of vehicle routing problems (VRPs), yet the role of positional encoding (PE) in these architectures remains largely unexplored. Unlike natural language, where tokens are uniformly spaced on a line, routing solutions exhibit several properties that render standard NLP positional encodings inadequate. In this work, we formalize three such structural properties that a routing-aware PE should respect, namely anisometric node distances, cyclic and direction-aware topology, and hierarchical depot-anchored global multi-route structure, combining them with a unifying design principle of geometric grounding. Guided by these criteria, we analyze and compare PE methods spanning NLP, graph-transformer, and routing-specific families, and propose a hierarchical anisometric PE that combines a distance-indexed, circularly consistent in-route encoding with a depot-anchored angular cross-route encoding. Extensive experiments across diverse VRP variants demonstrate that geometry-grounded PE consistently outperforms index-based alternatives, with gains that transfer across problem variants, model architectures, and distribution shifts.

94.3LGMay 11
Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

Han Zheng, Yining Ma, Karthick Gunasekaran et al.

In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the policy's training dynamics. In this paper, we introduce METIS (METacognitive Internalized Self-judgment), a novel framework that internalizes curriculum judgment as a native capability. Leveraging a critical observation that within-prompt reward variance effectively gauges prompt informativeness, METIS predicts this metric based on recent training outcomes as lightweight in-context learning examples. This intrinsic self-judgment then dynamically dictates the training allocation. Moreover, METIS closes the loop between judgment and optimization by jointly optimizing the standard RFT rewards and a self-judgment reward. This allows the policy to learn what to learn next, as a form of metacognition. Across extensive discrete and continuous RFT benchmarks from mathematical reasoning, code generation, to agentic function-calling, METIS consistently delivers superior performance while accelerating convergence by up to 67%. By bypassing handcrafted heuristics and auxiliary models, our work establishes a simple, closed-loop, and highly efficient curriculum internalization paradigm for LLM reinforcement fine-tuning.

SYJan 2
Probability-Aware Parking Selection

Cameron Hickert, Sirui Li, Zhengbing He et al.

Current navigation systems conflate time-to-drive with the true time-to-arrive by ignoring parking search duration and the final walking leg. Such underestimation can significantly affect user experience, mode choice, congestion, and emissions. To address this issue, this paper introduces the probability-aware parking selection problem, which aims to direct drivers to the best parking location rather than straight to their destination. An adaptable dynamic programming framework is proposed that leverages probabilistic, lot-level availability to minimize the expected time-to-arrive. Closed-form analysis determines when it is optimal to target a specific parking lot or explore alternatives, as well as the expected time cost. Sensitivity analysis and three illustrative cases are examined, demonstrating the model's ability to account for the dynamic nature of parking availability. Given the high cost of permanent sensing infrastructure, we assess the error rates of using stochastic observations to estimate availability. Experiments with real-world data from the US city of Seattle indicate this approach's viability, with mean absolute error decreasing from 7% to below 2% as observation frequency increases. In data-based simulations, probability-aware strategies demonstrate time savings up to 66% relative to probability-unaware baselines, yet still take up to 123% longer than time-to-drive estimates.

AIOct 21, 2025Code
AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

Minwei Kong, Ao Qu, Xiaotong Guo et al.

Optimization modeling enables critical decisions across industries but remains difficult to automate: informal language must be mapped to precise mathematical formulations and executable solver code. Prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present AlphaOPT, a self-improving experience library that enables an LLM to learn from limited demonstrations (even answers alone, without gold-standard programs) and solver feedback - without annotated reasoning traces or parameter updates. AlphaOPT operates in a continual two-phase cycle: (i) a Library Learning phase that reflects on failed attempts, extracting solver-verified, structured insights as {taxonomy, condition, explanation, example}; and (ii) a Library Evolution phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65% to 72% from 100 to 300 training items) and surpasses the strongest baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only on answers. Code and data are available at: https://github.com/Minw913/AlphaOPT.

AINov 8, 2021Code
Reinforcement Learning for Mixed Autonomy Intersections

Zhongxia Yan, Cathy Wu

We propose a model-free reinforcement learning method for controlling mixed autonomy traffic in simulated traffic networks with through-traffic-only two-way and four-way intersections. Our method utilizes multi-agent policy decomposition which allows decentralized control based on local observations for an arbitrary number of controlled vehicles. We demonstrate that, even without reward shaping, reinforcement learning learns to coordinate the vehicles to exhibit traffic signal-like behaviors, achieving near-optimal throughput with 33-50% controlled vehicles. With the help of multi-task learning and transfer learning, we show that this behavior generalizes across inflow rates and size of the traffic network. Our code, models, and videos of results are available at https://github.com/ZhongxiaYan/mixed_autonomy_intersections.

CVMar 9, 2021Code
SMIL: Multimodal Learning with Severely Missing Modality

Mengmeng Ma, Jian Ren, Long Zhao et al.

A common assumption in multimodal learning is the completeness of training data, i.e., full modalities are available in all training examples. Although there exists research endeavor in developing novel methods to tackle the incompleteness of testing data, e.g., modalities are partially missing in testing examples, few of them can handle incomplete training modalities. The problem becomes even more challenging if considering the case of severely missing, e.g., 90% training examples may have incomplete modalities. For the first time in the literature, this paper formally studies multimodal learning with missing modality in terms of flexibility (missing modalities in training, testing, or both) and efficiency (most training data have incomplete modality). Technically, we propose a new method named SMIL that leverages Bayesian meta-learning in uniformly achieving both objectives. To validate our idea, we conduct a series of experiments on three popular benchmarks: MM-IMDb, CMU-MOSI, and avMNIST. The results prove the state-of-the-art performance of SMIL over existing methods and generative baselines including autoencoders and generative adversarial networks. Our code is available at https://github.com/mengmenm/SMIL.

LGJan 13
Structure Detection for Contextual Reinforcement Learning

Tianyue Zhou, Jung-Hoon Cho, Cathy Wu

Contextual Reinforcement Learning (CRL) tackles the problem of solving a set of related Contextual Markov Decision Processes (CMDPs) that vary across different context variables. Traditional approaches--independent training and multi-task learning--struggle with either excessive computational costs or negative transfer. A recently proposed multi-policy approach, Model-Based Transfer Learning (MBTL), has demonstrated effectiveness by strategically selecting a few tasks to train and zero-shot transfer. However, CMDPs encompass a wide range of problems, exhibiting structural properties that vary from problem to problem. As such, different task selection strategies are suitable for different CMDPs. In this work, we introduce Structure Detection MBTL (SD-MBTL), a generic framework that dynamically identifies the underlying generalization structure of CMDP and selects an appropriate MBTL algorithm. For instance, we observe Mountain structure in which generalization performance degrades from the training performance of the target task as the context difference increases. We thus propose M/GP-MBTL, which detects the structure and adaptively switches between a Gaussian Process-based approach and a clustering-based approach. Extensive experiments on synthetic data and CRL benchmarks--covering continuous control, traffic control, and agricultural management--show that M/GP-MBTL surpasses the strongest prior method by 12.49% on the aggregated metric. These results highlight the promise of online structure detection for guiding source task selection in complex CRL environments.

MAFeb 1, 2024
Multi-agent Path Finding for Cooperative Autonomous Driving

Zhongxia Yan, Han Zheng, Cathy Wu

Anticipating possible future deployment of connected and automated vehicles (CAVs), cooperative autonomous driving at intersections has been studied by many works in control theory and intelligent transportation across decades. Simultaneously, recent parallel works in robotics have devised efficient algorithms for multi-agent path finding (MAPF), though often in environments with simplified kinematics. In this work, we hybridize insights and algorithms from MAPF with the structure and heuristics of optimizing the crossing order of CAVs at signal-free intersections. We devise an optimal and complete algorithm, Order-based Search with Kinematics Arrival Time Scheduling (OBS-KATS), which significantly outperforms existing algorithms, fixed heuristics, and prioritized planning with KATS. The performance is maintained under different vehicle arrival rates, lane lengths, crossing speeds, and control horizon. Through ablations and dissections, we offer insight on the contributing factors to OBS-KATS's performance. Our work is directly applicable to many similarly scaled traffic and multi-robot scenarios with directed lanes.

OCFeb 18, 2025
Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling

Sirui Li, Wenbin Ouyang, Yining Ma et al.

Long-horizon combinatorial optimization problems (COPs), such as the Flexible Job-Shop Scheduling Problem (FJSP), often involve complex, interdependent decisions over extended time frames, posing significant challenges for existing solvers. While Rolling Horizon Optimization (RHO) addresses this by decomposing problems into overlapping shorter-horizon subproblems, such overlap often involves redundant computations. In this paper, we present L-RHO, the first learning-guided RHO framework for COPs. L-RHO employs a neural network to intelligently fix variables that in hindsight did not need to be re-optimized, resulting in smaller and thus easier-to-solve subproblems. For FJSP, this means identifying operations with unchanged machine assignments between consecutive subproblems. Applied to FJSP, L-RHO accelerates RHO by up to 54% while significantly improving solution quality, outperforming other heuristic and learning-based baselines. We also provide in-depth discussions and verify the desirable adaptability and generalization of L-RHO across numerous FJSP variates, distributions, online scenarios and benchmark instances. Moreover, we provide a theoretical analysis to elucidate the conditions under which learning is beneficial.

ROMar 7, 2024
Generalizing Cooperative Eco-driving via Multi-residual Task Learning

Vindula Jayawardana, Sirui Li, Cathy Wu et al.

Conventional control, such as model-based control, is commonly utilized in autonomous driving due to its efficiency and reliability. However, real-world autonomous driving contends with a multitude of diverse traffic scenarios that are challenging for these planning algorithms. Model-free Deep Reinforcement Learning (DRL) presents a promising avenue in this direction, but learning DRL control policies that generalize to multiple traffic scenarios is still a challenge. To address this, we introduce Multi-residual Task Learning (MRTL), a generic learning framework based on multi-task learning that, for a set of task scenarios, decomposes the control into nominal components that are effectively solved by conventional control methods and residual terms which are solved using learning. We employ MRTL for fleet-level emission reduction in mixed traffic using autonomous vehicles as a means of system control. By analyzing the performance of MRTL across nearly 600 signalized intersections and 1200 traffic scenarios, we demonstrate that it emerges as a promising approach to synergize the strengths of DRL and conventional methods in generalizable control.

LGJun 22, 2025
Learning to Segment for Vehicle Routing Problems

Wenbin Ouyang, Sirui Li, Yining Ma et al.

Iterative heuristics are widely recognized as state-of-the-art for Vehicle Routing Problems (VRPs). In this work, we exploit a critical observation: a large portion of the solution remains stable, i.e., unchanged across search iterations, causing redundant computations, especially for large-scale VRPs with long subtours. To address this, we pioneer the formal study of the First-Segment-Then-Aggregate (FSTA) decomposition technique to accelerate iterative solvers. FSTA preserves stable solution segments during the search, aggregates nodes within each segment into fixed hypernodes, and focuses the search only on unstable portions. Yet, a key challenge lies in identifying which segments should be aggregated. To this end, we introduce Learning-to-Segment (L2Seg), a novel neural framework to intelligently differentiate potentially stable and unstable portions for FSTA decomposition. We present three L2Seg variants: non-autoregressive (globally comprehensive but locally indiscriminate), autoregressive (locally refined but globally deficient), and their synergy. Empirical results on CVRP and VRPTW show that L2Seg accelerates state-of-the-art solvers by 2x to 7x. We further provide in-depth analysis showing why synergy achieves the best performance. Notably, L2Seg is compatible with traditional, learning-based, and hybrid solvers, while supporting various VRPs.

RODec 16, 2023
Model-free Learning of Corridor Clearance: A Near-term Deployment Perspective

Dajiang Suo, Vindula Jayawardana, Cathy Wu

An emerging public health application of connected and automated vehicle (CAV) technologies is to reduce response times of emergency medical service (EMS) by indirectly coordinating traffic. Therefore, in this work we study the CAV-assisted corridor clearance for EMS vehicles from a short term deployment perspective. Existing research on this topic often overlooks the impact of EMS vehicle disruptions on regular traffic, assumes 100% CAV penetration, relies on real-time traffic signal timing data and queue lengths at intersections, and makes various assumptions about traffic settings when deriving optimal model-based CAV control strategies. However, these assumptions pose significant challenges for near-term deployment and limit the real-world applicability of such methods. To overcome these challenges and enhance real-world applicability in near-term, we propose a model-free approach employing deep reinforcement learning (DRL) for designing CAV control strategies, showing its reduced overhead in designing and greater scalability and performance compared to model-based methods. Our qualitative analysis highlights the complexities of designing scalable EMS corridor clearance controllers for diverse traffic settings in which DRL controller provides ease of design compared to the model-based methods. In numerical evaluations, the model-free DRL controller outperforms the model-based counterpart by improving traffic flow and even improving EMS travel times in scenarios when a single CAV is present. Across 19 considered settings, the learned DRL controller excels by 25% in reducing the travel time in six instances, achieving an average improvement of 9%. These findings underscore the potential and promise of model-free DRL strategies in advancing EMS response and traffic flow coordination, with a focus on practical near-term deployment.

LGDec 5, 2023
Data-Driven Traffic Reconstruction and Kernel Methods for Identifying Stop-and-Go Congestion

Edgar Ramirez Sanchez, Shreyaa Raghavan, Cathy Wu

Identifying stop-and-go events (SAGs) in traffic flow presents an important avenue for advancing data-driven research for climate change mitigation and sustainability, owing to their substantial impact on carbon emissions, travel time, fuel consumption, and roadway safety. In fact, SAGs are estimated to account for 33-50% of highway driving externalities. However, insufficient attention has been paid to precisely quantifying where, when, and how much these SAGs take place -necessary for downstream decision making, such as intervention design and policy analysis. A key challenge is that the data available to researchers and governments are typically sparse and aggregated to a granularity that obscures SAGs. To overcome such data limitations, this study thus explores the use of traffic reconstruction techniques for SAG identification. In particular, we introduce a kernel-based method for identifying spatio-temporal features in traffic and leverage bootstrapping to quantify the uncertainty of the reconstruction process. Experimental results on California highway data demonstrate the promise of the method for capturing SAGs. This work contributes to a foundation for data-driven decision making to advance sustainability of traffic systems.

CLApr 15, 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models

Yutong Xia, Ao Qu, Yunhan Zheng et al.

Urban causal research is essential for understanding the complex, dynamic processes that shape cities and for informing evidence-based policies. However, current practices are often constrained by inefficient and biased hypothesis formulation, challenges in integrating multimodal data, and fragile experimental methodologies. Imagine a system that automatically estimates the causal impact of congestion pricing on commute times by income group or measures how new green spaces affect asthma rates across neighborhoods using satellite imagery and health reports, and then generates comprehensive, policy-ready outputs, including causal estimates, subgroup analyses, and actionable recommendations. In this Perspective, we propose UrbanCIA, an LLM-driven conceptual framework composed of four distinct modular agents responsible for hypothesis generation, data engineering, experiment design and execution, and results interpretation with policy insights. We begin by examining the current landscape of urban causal research through a structured taxonomy of research topics, data sources, and methodological approaches, revealing systemic limitations across the workflow. Next, we introduce the design principles and technological roadmap for the four modules in the proposed framework. We also propose evaluation criteria to assess the rigor and transparency of these AI-augmented processes. Finally, we reflect on the broader implications for human-AI collaboration, equity, and accountability. We call for a new research agenda that embraces LLM-driven tools as catalysts for more scalable, reproducible, and inclusive urban research.

ROJul 15, 2025
A Roadmap for Climate-Relevant Robotics Research

Alan Papalia, Charles Dawson, Laurentiu L. Anton et al. · mit

Climate change is one of the defining challenges of the 21st century, and many in the robotics community are looking for ways to contribute. This paper presents a roadmap for climate-relevant robotics research, identifying high-impact opportunities for collaboration between roboticists and experts across climate domains such as energy, the built environment, transportation, industry, land use, and Earth sciences. These applications include problems such as energy systems optimization, construction, precision agriculture, building envelope retrofits, autonomous trucking, and large-scale environmental monitoring. Critically, we include opportunities to apply not only physical robots but also the broader robotics toolkit - including planning, perception, control, and estimation algorithms - to climate-relevant problems. A central goal of this roadmap is to inspire new research directions and collaboration by highlighting specific, actionable problems at the intersection of robotics and climate. This work represents a collaboration between robotics researchers and domain experts in various climate disciplines, and it serves as an invitation to the robotics community to bring their expertise to bear on urgent climate priorities.

APJun 17, 2025
Markov Regime-Switching Intelligent Driver Model for Interpretable Car-Following Behavior

Chengyuan Zhang, Cathy Wu, Lijun Sun

Accurate and interpretable car-following models are essential for traffic simulation and autonomous vehicle development. However, classical models like the Intelligent Driver Model (IDM) are fundamentally limited by their parsimonious and single-regime structure. They fail to capture the multi-modal nature of human driving, where a single driving state (e.g., speed, relative speed, and gap) can elicit many different driver actions. This forces the model to average across distinct behaviors, reducing its fidelity and making its parameters difficult to interpret. To overcome this, we introduce a regime-switching framework that allows driving behavior to be governed by different IDM parameter sets, each corresponding to an interpretable behavioral mode. This design enables the model to dynamically switch between interpretable behavioral modes, rather than averaging across diverse driving contexts. We instantiate the framework using a Factorial Hidden Markov Model with IDM dynamics (FHMM-IDM), which explicitly separates intrinsic driving regimes (e.g., aggressive acceleration, steady-state following) from external traffic scenarios (e.g., free-flow, congestion, stop-and-go) through two independent latent Markov processes. Bayesian inference via Markov chain Monte Carlo (MCMC) is used to jointly estimate the regime-specific parameters, transition dynamics, and latent state trajectories. Experiments on the HighD dataset demonstrate that FHMM-IDM uncovers interpretable structure in human driving, effectively disentangling internal driver actions from contextual traffic conditions and revealing dynamic regime-switching patterns. This framework provides a tractable and principled solution to modeling context-dependent driving behavior under uncertainty, offering improvements in the fidelity of traffic simulations, the efficacy of safety analyses, and the development of more human-centric ADAS.

SYApr 3, 2025
Route Recommendations for Traffic Management Under Learned Partial Driver Compliance

Heeseung Bang, Jung-Hoon Cho, Cathy Wu et al. · mit

In this paper, we aim to mitigate congestion in traffic management systems by guiding travelers along system-optimal (SO) routes. However, we recognize that most theoretical approaches assume perfect driver compliance, which often does not reflect reality, as drivers tend to deviate from recommendations to fulfill their personal objectives. Therefore, we propose a route recommendation framework that explicitly learns partial driver compliance and optimizes traffic flow under realistic adherence. We first compute an SO edge flow through flow optimization techniques. Next, we train a compliance model based on historical driver decisions to capture individual responses to our recommendations. Finally, we formulate a stochastic optimization problem that minimizes the gap between the target SO flow and the realized flow under conditions of imperfect adherence. Our simulations conducted on a grid network reveal that our approach significantly reduces travel time compared to baseline strategies, demonstrating the practical advantage of incorporating learned compliance into traffic management.

LGMar 29, 2025
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations

Enrico Marchesini, Benjamin Donnot, Constance Crozier et al.

Reinforcement learning (RL) can provide adaptive and scalable controllers essential for power grid decarbonization. However, RL methods struggle with power grids' complex dynamics, long-horizon goals, and hard physical constraints. For these reasons, we present RL2Grid, a benchmark designed in collaboration with power system operators to accelerate progress in grid control and foster RL maturity. Built on RTE France's power simulation framework, RL2Grid standardizes tasks, state and action spaces, and reward structures for a systematic evaluation and comparison of RL algorithms. Moreover, we integrate operational heuristics and design safety constraints based on human expertise to ensure alignment with physical requirements. By establishing reference performance metrics for classic RL baselines on RL2Grid's tasks, we highlight the need for novel methods capable of handling real systems and discuss future directions for RL-based grid control.

DLJan 20
Measuring the State of Open Science in Transportation Using Large Language Models

Junyi Ji, Ruth Lu, Linda Belkessa et al.

Open science initiatives have strengthened scientific integrity and accelerated research progress across many fields, but the state of their practice within transportation research remains under-investigated. Key features of open science, defined here as data and code availability, are difficult to extract due to the inherent complexity of the field. Previous work has either been limited to small-scale studies due to the labor-intensive nature of manual analysis or has relied on large-scale bibliometric approaches that sacrifice contextual richness. This paper introduces an automatic and scalable feature-extraction pipeline to measure data and code availability in transportation research. We employ Large Language Models (LLMs) for this task and validate their performance against a manually curated dataset and through an inter-rater agreement analysis. We applied this pipeline to examine 10,724 research articles published in the Transportation Research Part series of journals between 2019 and 2024. Our analysis found that only 5% of quantitative papers shared a code repository, 4% of quantitative papers shared a data repository, and about 3% of papers shared both, with trends differing across journals, topics, and geographic regions. We found no significant difference in citation counts or review duration between papers that provided data and code and those that did not, suggesting a misalignment between open science efforts and traditional academic metrics. Consequently, encouraging these practices will likely require structural interventions from journals and funding agencies to supplement the lack of direct author incentives. The pipeline developed in this study can be readily scaled to other journals, representing a critical step toward the automated measurement and monitoring of open science practices in transportation research.

SYAug 10, 2025
Noise-Aware Generative Microscopic Traffic Simulation

Vindula Jayawardana, Catherine Tang, Junyi Ji et al.

Accurately modeling individual vehicle behavior in microscopic traffic simulation remains a key challenge in intelligent transportation systems, as it requires vehicles to realistically generate and respond to complex traffic phenomena such as phantom traffic jams. While traditional human driver simulation models offer computational tractability, they do so by abstracting away the very complexity that defines human driving. On the other hand, recent advances in infrastructure-mounted camera-based roadway sensing have enabled the extraction of vehicle trajectory data, presenting an opportunity to shift toward generative, agent-based models. Yet, a major bottleneck remains: most existing datasets are either overly sanitized or lack standardization, failing to reflect the noisy, imperfect nature of real-world sensing. Unlike data from vehicle-mounted sensors-which can mitigate sensing artifacts like occlusion through overlapping fields of view and sensor fusion-infrastructure-based sensors surface a messier, more practical view of challenges that traffic engineers encounter. To this end, we present the I-24 MOTION Scenario Dataset (I24-MSD)-a standardized, curated dataset designed to preserve a realistic level of sensor imperfection, embracing these errors as part of the learning problem rather than an obstacle to overcome purely from preprocessing. Drawing from noise-aware learning strategies in computer vision, we further adapt existing generative models in the autonomous driving community for I24-MSD with noise-aware loss functions. Our results show that such models not only outperform traditional baselines in realism but also benefit from explicitly engaging with, rather than suppressing, data imperfection. We view I24-MSD as a stepping stone toward a new generation of microscopic traffic simulation that embraces the real-world challenges and is better aligned with practical needs.

ROJul 14, 2025
Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems

Vindula Jayawardana, Sirui Li, Yashar Farid et al.

Autonomous vehicles (AVs) are becoming increasingly popular, with their applications now extending beyond just a mode of transportation to serving as mobile actuators of a traffic flow to control flow dynamics. This contrasts with traditional fixed-location actuators, such as traffic signals, and is referred to as Lagrangian traffic control. However, designing effective Lagrangian traffic control policies for AVs that generalize across traffic scenarios introduces a major challenge. Real-world traffic environments are highly diverse, and developing policies that perform robustly across such diverse traffic scenarios is challenging. It is further compounded by the joint complexity of the multi-agent nature of traffic systems, mixed motives among participants, and conflicting optimization objectives subject to strict physical and external constraints. To address these challenges, we introduce Multi-Residual Mixture of Expert Learning (MRMEL), a novel framework for Lagrangian traffic control that augments a given suboptimal nominal policy with a learned residual while explicitly accounting for the structure of the traffic scenario space. In particular, taking inspiration from residual reinforcement learning, MRMEL augments a suboptimal nominal AV control policy by learning a residual correction, but at the same time dynamically selects the most suitable nominal policy from a pool of nominal policies conditioned on the traffic scenarios and modeled as a mixture of experts. We validate MRMEL using a case study in cooperative eco-driving at signalized intersections in Atlanta, Dallas Fort Worth, and Salt Lake City, with real-world data-driven traffic scenarios. The results show that MRMEL consistently yields superior performance-achieving an additional 4%-9% reduction in aggregate vehicle emissions relative to the strongest baseline in each setting.

APJul 9, 2025
When Context Is Not Enough: Modeling Unexplained Variability in Car-Following Behavior

Chengyuan Zhang, Zhengbing He, Cathy Wu et al.

Modeling car-following behavior is fundamental to microscopic traffic simulation, yet traditional deterministic models often fail to capture the full extent of variability and unpredictability in human driving. While many modern approaches incorporate context-aware inputs (e.g., spacing, speed, relative speed), they frequently overlook structured stochasticity that arises from latent driver intentions, perception errors, and memory effects -- factors that are not directly observable from context alone. To fill the gap, this study introduces an interpretable stochastic modeling framework that captures not only context-dependent dynamics but also residual variability beyond what context can explain. Leveraging deep neural networks integrated with nonstationary Gaussian processes (GPs), our model employs a scenario-adaptive Gibbs kernel to learn dynamic temporal correlations in acceleration decisions, where the strength and duration of correlations between acceleration decisions evolve with the driving context. This formulation enables a principled, data-driven quantification of uncertainty in acceleration, speed, and spacing, grounded in both observable context and latent behavioral variability. Comprehensive experiments on the naturalistic vehicle trajectory dataset collected from the German highway, i.e., the HighD dataset, demonstrate that the proposed stochastic simulation method within this framework surpasses conventional methods in both predictive performance and interpretable uncertainty quantification. The integration of interpretability and accuracy makes this framework a promising tool for traffic analysis and safety-critical applications.

LGJun 30, 2024
Cooperative Advisory Residual Policies for Congestion Mitigation

Aamir Hasan, Neeloy Chakraborty, Haonan Chen et al.

Fleets of autonomous vehicles can mitigate traffic congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these approaches are limited in practice as they assume precise control over autonomous vehicle fleets, incur extensive installation costs for a centralized sensor ecosystem, and also fail to account for uncertainty in driver behavior. To this end, we develop a class of learned residual policies that can be used in cooperative advisory systems and only require the use of a single vehicle with a human driver. Our policies advise drivers to behave in ways that mitigate traffic congestion while accounting for diverse driver behaviors, particularly drivers' reactions to instructions, to provide an improved user experience. To realize such policies, we introduce an improved reward function that explicitly addresses congestion mitigation and driver attitudes to advice. We show that our residual policies can be personalized by conditioning them on an inferred driver trait that is learned in an unsupervised manner with a variational autoencoder. Our policies are trained in simulation with our novel instruction adherence driver model, and evaluated in simulation and through a user study (N=16) to capture the sentiments of human drivers. Our results show that our approaches successfully mitigate congestion while adapting to different driver behaviors, with up to 20% and 40% improvement as measured by a combination metric of speed and deviations in speed across time over baselines in our simulation tests and user study, respectively. Our user study further shows that our policies are human-compatible and personalize to drivers.

HCJun 29, 2024
Lessons in Cooperation: A Qualitative Analysis of Driver Sentiments towards Real-Time Advisory Systems from a Driving Simulator User Study

Aamir Hasan, Neeloy Chakraborty, Haonan Chen et al.

Real-time Advisory (RTA) systems, such as navigational and eco-driving assistants, are becoming increasingly ubiquitous in vehicles due to their benefits for users and society. Until autonomous vehicles mature, such advisory systems will continue to expand their ability to cooperate with drivers, enabling safer and more eco-friendly driving practices while improving user experience. However, the interactions between these systems and drivers have not been studied extensively. To this end, we conduct a driving simulator study (N=16) to capture driver reactions to a Cooperative RTA system. Through a case study with a congestion mitigation assistant, we qualitatively analyze the sentiments of drivers towards advisory systems and discuss driver preferences for various aspects of the interaction. We comment on how the advice should be communicated, the effects of the advice on driver trust, and how drivers adapt to the system. We present recommendations to inform the future design of Cooperative RTA systems.

LGJan 26, 2024
Expert with Clustering: Hierarchical Online Preference Learning Framework

Tianyue Zhou, Jung-Hoon Cho, Babak Rahimi Ardabili et al.

Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users. We introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ options, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. To the best of the authors knowledge, this is the first work to analyze the regret of an integrated expert algorithm with k-Means clustering. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.

LGDec 14, 2021
Cooperation for Scalable Supervision of Autonomy in Mixed Traffic

Cameron Hickert, Sirui Li, Cathy Wu

Advances in autonomy offer the potential for dramatic positive outcomes in a number of domains, yet enabling their safe deployment remains an open problem. This work's motivating question is: In safety-critical settings, can we avoid the need to have one human supervise one machine at all times? The work formalizes this scalable supervision problem by considering remotely located human supervisors and investigating how autonomous agents can cooperate to achieve safety. This article focuses on the safety-critical context of autonomous vehicles (AVs) merging into traffic consisting of a mixture of AVs and human drivers. The analysis establishes high reliability upper bounds on human supervision requirements. It further shows that AV cooperation can improve supervision reliability by orders of magnitude and counterintuitively requires fewer supervisors (per AV) as more AVs are adopted. These analytical results leverage queuing-theoretic analysis, order statistics, and a conservative, reachability-based approach. A key takeaway is the potential value of cooperation in enabling the deployment of autonomy at scale. While this work focuses on AVs, the scalable supervision framework may be of independent interest to a broader array of autonomous control challenges.

LGAug 16, 2021
Understanding the factors driving the opioid epidemic using machine learning

Sachin Gavali, Chuming Chen, Julie Cowart et al.

In recent years, the US has experienced an opioid epidemic with an unprecedented number of drugs overdose deaths. Research finds such overdose deaths are linked to neighborhood-level traits, thus providing opportunity to identify effective interventions. Typically, techniques such as Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE) are used to document neighborhood-level factors significant in explaining such adverse outcomes. These techniques are, however, less equipped to ascertain non-linear relationships between confounding factors. Hence, in this study we apply machine learning based techniques to identify opioid risks of neighborhoods in Delaware and explore the correlation of these factors using Shapley Additive explanations (SHAP). We discovered that the factors related to neighborhoods environment, followed by education and then crime, were highly correlated with higher opioid risk. We also explored the change in these correlations over the years to understand the changing dynamics of the epidemic. Furthermore, we discovered that, as the epidemic has shifted from legal (i.e., prescription opioids) to illegal (e.g.,heroin and fentanyl) drugs in recent years, the correlation of environment, crime and health related variables with the opioid risk has increased significantly while the correlation of economic and socio-demographic variables has decreased. The correlation of education related factors has been higher from the start and has increased slightly in recent years suggesting a need for increased awareness about the opioid epidemic.

LGJul 8, 2021
Learning to Delegate for Large-scale Vehicle Routing

Sirui Li, Zhongxia Yan, Cathy Wu

Vehicle routing problems (VRPs) form a class of combinatorial problems with wide practical applications. While previous heuristic or learning-based works achieve decent solutions on small problem instances of up to 100 cities, their performance deteriorates in large problems. This article presents a novel learning-augmented local search framework to solve large-scale VRP. The method iteratively improves the solution by identifying appropriate subproblems and $\textit{delegating}$ their improvement to a black box subsolver. At each step, we leverage spatial locality to consider only a linear number of subproblems, rather than exponential. We frame subproblem selection as regression and train a Transformer on a generated training set of problem instances. Our method accelerates state-of-the-art VRP solvers by 10x to 100x while achieving competitive solution qualities for VRPs with sizes ranging from 500 to 3000. Learned subproblem selection offers a 1.5x to 2x speedup over heuristic or random selection. Our results generalize to a variety of VRP distributions, variants, and solvers.