Farzan Moosavi

QUANT-PHAug 7, 2025

Quantum-Efficient Reinforcement Learning Solutions for Last-Mile On-Demand Delivery

Farzan Moosavi, Bilal Farooq

Quantum computation has demonstrated a promising alternative to solving the NP-hard combinatorial problems. Specifically, when it comes to optimization, classical approaches become intractable to account for large-scale solutions. Specifically, we investigate quantum computing to solve the large-scale Capacitated Pickup and Delivery Problem with Time Windows (CPDPTW). In this regard, a Reinforcement Learning (RL) framework augmented with a Parametrized Quantum Circuit (PQC) is designed to minimize the travel time in a realistic last-mile on-demand delivery. A novel problem-specific encoding quantum circuit with an entangling and variational layer is proposed. Moreover, Proximal Policy Optimization (PPO) and Quantum Singular Value Transformation (QSVT) are designed for comparison through numerical experiments, highlighting the superiority of the proposed method in terms of the scale of the solution and training complexity while incorporating the real-world constraints.

LGDec 23, 2024

A Coalition Game for On-demand Multi-modal 3D Automated Delivery System

Farzan Moosavi, Bilal Farooq

We introduce a multi-modal autonomous delivery optimization framework as a coalition game for a fleet of UAVs and ADRs operating in two overlaying networks to address last-mile delivery in urban environments, including high-density areas and time-critical applications. The problem is defined as multiple depot pickup and delivery with time windows constrained over operational restrictions, such as vehicle battery limitation, precedence time window, and building obstruction. Utilizing the coalition game theory, we investigate cooperation structures among the modes to capture how strategic collaboration can improve overall routing efficiency. To do so, a generalized reinforcement learning model is designed to evaluate the cost-sharing and allocation to different modes to learn the cooperative behaviour with respect to various realistic scenarios. Our methodology leverages an end-to-end deep multi-agent policy gradient method augmented by a novel spatio-temporal adjacency neighbourhood graph attention network using a heterogeneous edge-enhanced attention model and transformer architecture. Several numerical experiments on last-mile delivery applications have been conducted, showing the results from the case study in the city of Mississauga, which shows that despite the incorporation of an extensive network in the graph for two modes and a complex training structure, the model addresses realistic operational constraints and achieves high-quality solutions compared with the existing transformer-based and classical methods. It can perform well on non-homogeneous data distribution, generalizes well on different scales and configurations, and demonstrates a robust cooperative performance under stochastic scenarios across various tasks, which is effectively reflected by coalition analysis and cost allocation to signify the advantage of cooperation.

Farzan Moosavi

2 Papers