Xinwu Qian

h-index1
2papers

2 Papers

LGAug 3, 2025
Unsupervised Learning for the Elementary Shortest Path Problem

Jingyi Chen, Xinyuan Zhang, Xinwu Qian

The Elementary Shortest-Path Problem(ESPP) seeks a minimum cost path from s to t that visits each vertex at most once. The presence of negative-cost cycles renders the problem NP-hard. We present a probabilistic method for finding near-optimal ESPP, enabled by an unsupervised graph neural network that jointly learns node value estimates and edge-selection probabilities via a surrogate loss function. The loss provides a high probability certificate of finding near-optimal ESPP solutions by simultaneously reducing negative-cost cycles and embedding the desired algorithmic alignment. At inference time, a decoding algorithm transforms the learned edge probabilities into an elementary path. Experiments on graphs of up to 100 nodes show that the proposed method surpasses both unsupervised baselines and classical heuristics, while exhibiting high performance in cross-size and cross-topology generalization on unseen synthetic graphs.

MASep 9, 2021
DROP: Deep relocating option policy for optimal ride-hailing vehicle repositioning

Xinwu Qian, Shuocheng Guo, Vaneet Aggarwal

In a ride-hailing system, an optimal relocation of vacant vehicles can significantly reduce fleet idling time and balance the supply-demand distribution, enhancing system efficiency and promoting driver satisfaction and retention. Model-free deep reinforcement learning (DRL) has been shown to dynamically learn the relocating policy by actively interacting with the intrinsic dynamics in large-scale ride-hailing systems. However, the issues of sparse reward signals and unbalanced demand and supply distribution place critical barriers in developing effective DRL models. Conventional exploration strategy (e.g., the $ε$-greedy) may barely work under such an environment because of dithering in low-demand regions distant from high-revenue regions. This study proposes the deep relocating option policy (DROP) that supervises vehicle agents to escape from oversupply areas and effectively relocate to potentially underserved areas. We propose to learn the Laplacian embedding of a time-expanded relocation graph, as an approximation representation of the system relocation policy. The embedding generates task-agnostic signals, which in combination with task-dependent signals, constitute the pseudo-reward function for generating DROPs. We present a hierarchical learning framework that trains a high-level relocation policy and a set of low-level DROPs. The effectiveness of our approach is demonstrated using a custom-built high-fidelity simulator with real-world trip record data. We report that DROP significantly improves baseline models with 15.7% more hourly revenue and can effectively resolve the dithering issue in low-demand areas.