Xiao Mao

DS
h-index17
5papers
75citations
Novelty62%
AI Score52

5 Papers

LGAug 4, 2022
DL-DRL: A double-level deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Xiao Mao, Zhiguang Cao, Mingfeng Fan et al.

Exploiting unmanned aerial vehicles (UAVs) to execute tasks is gaining growing popularity recently. To solve the underlying task scheduling problem, the deep reinforcement learning (DRL) based methods demonstrate notable advantage over the conventional heuristics as they rely less on hand-engineered rules. However, their decision space will become prohibitively huge as the problem scales up, thus deteriorating the computation efficiency. To alleviate this issue, we propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF), where we decompose the task scheduling of multi-UAV into task allocation and route planning. Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs, and we exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks given the maximum flight distance of the UAV. To effectively train the two models, we design an interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. Experimental results show that our DL-DRL performs favorably against the learning-based and conventional baselines including the OR-Tools, in terms of solution quality and computation efficiency. We also verify the generalization performance of our approach by applying it to larger sizes of up to 1000 tasks. Moreover, we also show via an ablation study that our ITS can help achieve a balance between the performance and training efficiency.

2.5DSMar 30
Classifying Identities: Subcubic Distributivity Checking and Hardness from Arithmetic Progression Detection

Bartłomiej Dudek, Nick Fischer, Geri Gokaj et al.

We revisit the complexity of verifying basic identities, such as associativity and distributivity, on a given finite algebraic structure. In particular, while Rajagopalan and Schulman (FOCS'96, SICOMP'00) gave a surprising randomized algorithm to verify associativity of an operation $\odot: S\times S\to S$ in optimal time $O(|S|^2)$, they left the open problem of finding any subcubic algorithm for verifying distributivity of given operations $\odot,\oplus: S\times S\to S$. Our results are as follows: * We resolve the open problem by Rajagopalan and Schulman by devising an algorithm verifying distributivity in strongly subcubic time $O(|S|^ω)$, together with a matching conditional lower bound based on the Triangle Detection Hypothesis. * We propose arithmetic progression detection in small universes as a consequential algorithmic challenge: We show that unless we can detect $4$-term arithmetic progressions in a set $X\subseteq\{1,\dots, N\}$ in time $O(N^{2-ε})$, then (a) the 3-uniform 4-hyperclique hypothesis is true, and (b) verifying certain identities requires running time~$|S|^{3-o(1)}$. * A careful combination of our algorithmic and hardness ideas allows us to \emph{fully classify} a natural subclass of identities: Specifically, any 3-variable identity over binary operations in which no side is a subexpression of the other is either: (1) verifiable in randomized time $O(|S|^2)$, (2) verifiable in randomized time $O(|S|^ω)$ with a matching lower bound from triangle detection, or (3) trivially verifiable in time $O(|S|^3)$ with a matching lower bound from hardness of 4-term arithmetic progression detection. * We obtain near-optimal algorithms for verifying whether a given algebraic structure forms a field or ring, and show that \emph{counting} the number of distributive triples is conditionally harder than verifying distributivity.

28.4ROMar 19
CAMO: A Conditional Neural Solver for the Multi-objective Multiple Traveling Salesman Problem

Fengxiaoxiao Li, Xiao Mao, Mingfeng Fan et al.

Robotic systems often require a team of robots to collectively visit multiple targets while optimizing competing objectives, such as total travel cost and makespan. This setting can be formulated as the Multi-Objective Multiple Traveling Salesman Problem (MOMTSP). Although learning-based methods have shown strong performance on the single-agent TSP and multi-objective TSP variants, they rarely address the combined challenges of multi-agent coordination and multi-objective trade-offs, which introduce dual sources of complexity. To bridge this gap, we propose CAMO, a conditional neural solver for MOMTSP that generalizes across varying numbers of targets, agents, and preference vectors, and yields high-quality approximations to the Pareto front (PF). Specifically, CAMO consists of a conditional encoder to fuse preferences into instance representations, enabling explicit control over multi-objective trade-offs, and a collaborative decoder that coordinates all agents by alternating agent selection and node selection to construct multi-agent tours autoregressively. To further improve generalization, we train CAMO with a REINFORCE-based objective over a mixed distribution of problem sizes. Extensive experiments show that CAMO outperforms both neural and conventional heuristics, achieving a closer approximation of PFs. In addition, ablation results validate the contributions of CAMO's key components, and real-world tests on a mobile robot platform demonstrate its practical applicability.

20.5DSMar 31
Approximation Schemes for Edit Distance and LCS in Quasi-Strongly Subquadratic Time

Xiao Mao, Aviad Rubinstein

We present novel randomized approximation schemes for the Edit Distance (ED) problem and the Longest Common Subsequence (LCS) problem that, for any constant $ε>0$, compute a $(1+ε)$-approximation for ED and a $(1-ε)$-approximation for LCS in time $n^2 / 2^{\log^{Ω(1)}(n)}$ for two strings of total length at most $n$. This running time improves upon the classical quadratic-time dynamic programming algorithms by a quasi-polynomial factor. Our results yield significant insights into fine-grained complexity: Firstly, for ED, prior work indicates that any exact algorithm cannot be improved beyond a few logarithmic factors without refuting established complexity assumptions [Abboud, Hansen, Vassilevska Williams, Williams, 2016]; our quasi-polynomial speed-up shows a separation the complexity of approximate ED from that of exact ED, even for approximation factor arbitrarily close to $1$. Secondly, for LCS, obtaining similar approximation-time tradeoffs via deterministic algorithms would imply breakthrough circuit lower bounds [Chen, Goldwasser, Lyu, Rothblum, Rubinstein, 2019]; our randomized algorithm demonstrates derandomization hardness for LCS approximation.

LGSep 22, 2025
Improving After-sales Service: Deep Reinforcement Learning for Dynamic Time Slot Assignment with Commitments and Customer Preferences

Xiao Mao, Albert H. Schrotenboer, Guohua Wu et al.

Problem definition: For original equipment manufacturers (OEMs), high-tech maintenance is a strategic component in after-sales services, involving close coordination between customers and service engineers. Each customer suggests several time slots for their maintenance task, from which the OEM must select one. This decision needs to be made promptly to support customers' planning. At the end of each day, routes for service engineers are planned to fulfill the tasks scheduled for the following day. We study this hierarchical and sequential decision-making problem-the Dynamic Time Slot Assignment Problem with Commitments and Customer Preferences (DTSAP-CCP)-in this paper. Methodology/results: Two distinct approaches are proposed: 1) an attention-based deep reinforcement learning with rollout execution (ADRL-RE) and 2) a scenario-based planning approach (SBP). The ADRL-RE combines a well-trained attention-based neural network with a rollout framework for online trajectory simulation. To support the training, we develop a neural heuristic solver that provides rapid route planning solutions, enabling efficient learning in complex combinatorial settings. The SBP approach samples several scenarios to guide the time slot assignment. Numerical experiments demonstrate the superiority of ADRL-RE and the stability of SBP compared to both rule-based and rollout-based approaches. Furthermore, the strong practicality of ADRL-RE is verified in a case study of after-sales service for large medical equipment. Implications: This study provides OEMs with practical decision-support tools for dynamic maintenance scheduling, balancing customer preferences and operational efficiency. In particular, our ADRL-RE shows strong real-world potential, supporting timely and customer-aligned maintenance scheduling.