Atomic Proximal Policy Optimization for Electric Robo-Taxi Dispatch and Charger Allocation
This addresses scalable fleet management for electric robo-taxi services, an incremental improvement in deep reinforcement learning for transportation logistics.
The paper tackles the joint optimization of ride matching, repositioning, and charging for electric robo-taxi fleets in a stochastic environment, introducing Atomic-PPO to reduce action space complexity and demonstrating superior performance in experiments using real-world NYC data.
Pioneering companies such as Waymo have deployed robo-taxi services in several U.S. cities. These robo-taxis are electric vehicles, and their operations require the joint optimization of ride matching, vehicle repositioning, and charging scheduling in a stochastic environment. We model the operations of the ride-hailing system with robo-taxis as a discrete-time, average-reward Markov Decision Process with an infinite horizon. As the fleet size grows, dispatching becomes challenging, as both the system state space and the fleet dispatching action space grow exponentially with the number of vehicles. To address this, we introduce a scalable deep reinforcement learning algorithm, called Atomic Proximal Policy Optimization (Atomic-PPO), that reduces the action space using atomic action decomposition. We evaluate our algorithm using real-world NYC for-hire vehicle trip records and measure its performance by the long-run average reward achieved by the dispatching policy, relative to a fluid-based upper bound. Our experiments demonstrate the superior performance of Atomic-PPO compared to benchmark methods. Furthermore, we conduct extensive numerical experiments to analyze the efficient allocation of charging facilities and assess the impact of vehicle range and charger speed on system performance.