LGJun 29, 2023Code
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization BenchmarkFederico Berto, Chuanbo Hua, Junyoung Park et al. · pku
Combinatorial optimization (CO) is fundamental to several real-world applications, from logistics and scheduling to hardware design and resource allocation. Deep reinforcement learning (RL) has recently shown significant benefits in solving CO problems, reducing reliance on domain expertise and improving computational efficiency. However, the absence of a unified benchmarking framework leads to inconsistent evaluations, limits reproducibility, and increases engineering overhead, raising barriers to adoption for new researchers. To address these challenges, we introduce RL4CO, a unified and extensive benchmark with in-depth library coverage of 27 CO problem environments and 23 state-of-the-art baselines. Built on efficient software libraries and best practices in implementation, RL4CO features modularized implementation and flexible configurations of diverse environments, policy architectures, RL algorithms, and utilities with extensive documentation. RL4CO helps researchers build on existing successes while exploring and developing their own designs, facilitating the entire research process by decoupling science from heavy engineering. We finally provide extensive benchmark studies to inspire new insights and future work. RL4CO has already attracted numerous researchers in the community and is open-sourced at https://github.com/ai4co/rl4co.
MASep 5, 2024Code
PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial OptimizationFederico Berto, Chuanbo Hua, Laurin Luttmann et al.
Combinatorial optimization problems involving multiple agents are notoriously challenging due to their NP-hard nature and the necessity for effective agent coordination. Despite advancements in learning-based methods, existing approaches often face critical limitations, including suboptimal agent coordination, poor generalization, and high computational latency. To address these issues, we propose PARCO (Parallel AutoRegressive Combinatorial Optimization), a general reinforcement learning framework designed to construct high-quality solutions for multi-agent combinatorial tasks efficiently. To this end, PARCO integrates three key novel components: (1) transformer-based communication layers to enable effective agent collaboration during parallel solution construction, (2) a multiple pointer mechanism for low-latency, parallel agent decision-making, and (3) priority-based conflict handlers to resolve decision conflicts via learned priorities. We evaluate PARCO in multi-agent vehicle routing and scheduling problems, where our approach outperforms state-of-the-art learning methods, demonstrating strong generalization ability and remarkable computational efficiency. We make our source code publicly available to foster future research: https://github.com/ai4co/parco.
MAFeb 14, 2025Code
Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel DecodingLaurin Luttmann, Lin Xie
The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in warehouse logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently. Traditional heuristics and optimization-based approaches struggle with scalability, while recent machine learning methods often rely on sequential decision-making, leading to high solution latency and suboptimal agent coordination. In this work, we propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning. While our approach generates a joint distribution over agent actions, allowing for fast decoding and effective picker coordination, our method introduces a sequential action selection to avoid conflicts in the multi-dimensional action space. Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances. Our code is publicly available at http://github.com/LTluttmann/marl4msprp.
AIJan 31, 2019Code
Efficient order picking methods in robotic mobile fulfillment systemsLin Xie, Nils Thieme, Ruslan Krenzler et al.
Robotic mobile fulfillment systems (RMFSs) are a new type of warehousing system, which has received more attention recently, due to increasing growth in the e-commerce sector. Instead of sending pickers to the inventory area to search for and pick the ordered items, robots carry shelves (called "pods") including ordered items from the inventory area to picking stations. In the picking stations, human pickers put ordered items into totes; then these items are transported by a conveyor to the packing stations. This type of warehousing system relieves the human pickers and improves the picking process. In this paper, we concentrate on decisions about the assignment of pods to stations and orders to stations to fulfill picking for each incoming customer's order. In previous research for an RMFS with multiple picking stations, these decisions are made sequentially. Instead, we present a new integrated model. To improve the system performance even more, we extend our model by splitting orders. This means parts of an order are allowed to be picked at different stations. To the best of the authors' knowledge, this is the first publication on split orders in an RMFS. We analyze different performance metrics, such as pile-on, pod-station visits, robot moving distance and order turn-over time. We compare the results of our models in different instances with the sequential method in our open-source simulation framework RAWSim-O.
ROOct 8, 2018Code
From Simulation to Real-World Robotic Mobile Fulfillment SystemsLin Xie, Hanyi Li, Nils Thieme
In a new type of automated parts-to-picker warehouse system - a Robotic Mobile Fulfillment System (RMFS) - robots are sent to transport pods (movable shelves) to human operators at stations to pick/put items from/to pods. There are many operational decision problems in such a system, and some of them are interdependent and influence each other. In order to analyze the decision problems and the relationships between them, there are two open-source simulation frameworks in the literature, Alphabet Soup and RAWSim-O. However, the steps between simulation and real-world RMFS are not clear in the literature. Therefore, this paper aims to bridge this gap. The simulator is firstly transferred as core software. The core software is connected with an open-source ERP system, called Odoo, while it is also connected with real robots and stations through an XOR-bench. The XOR-bench enables the RMFS to be integrated with several mini-robots and mobile industrial robots in (removed) experiments for the purpose of research and education.
MEJul 10, 2024
Identification and Estimation of the Bi-Directional MR with Some Invalid InstrumentsFeng Xie, Zhen Yao, Lin Xie et al.
We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming that the causal model is a one-directional MR model. As such, in this paper, we first theoretically investigate the identification of the bi-directional MR from observational data. In particular, we provide necessary and sufficient conditions under which valid IV sets are correctly identified such that the bi-directional MR model is identifiable, including the causal directions of a pair of phenotypes (i.e., the treatment and outcome). Moreover, based on the identification theory, we develop a cluster fusion-like method to discover valid IV sets and estimate the causal effects of interest. We theoretically demonstrate the correctness of the proposed algorithm. Experimental results show the effectiveness of our method for estimating causal effects in bi-directional MR.
AIDec 17, 2025
A Clustering-Based Variable Ordering Framework for Relaxed Decision Diagrams for Maximum Weighted Independent Set ProblemMohsen Nafar, Michael Römer, Lin Xie
Efficient exact algorithms for Discrete Optimization (DO) rely heavily on strong primal and dual bounds. Relaxed Decision Diagrams (DDs) provide a versatile mechanism for deriving such dual bounds by compactly over-approximating the solution space through node merging. However, the quality of these relaxed diagrams, i.e. the tightness of the resulting dual bounds, depends critically on the variable ordering and the merging decisions executed during compilation. While dynamic variable ordering heuristics effectively tighten bounds, they often incur computational overhead when evaluated globally across the entire variable set. To mitigate this trade-off, this work introduces a novel clustering-based framework for variable ordering. Instead of applying dynamic ordering heuristics to the full set of unfixed variables, we first partition variables into clusters. We then leverage this structural decomposition to guide the ordering process, significantly reducing the heuristic's search space. Within this framework, we investigate two distinct strategies: Cluster-to-Cluster, which processes clusters sequentially using problem-specific aggregate criteria (such as cumulative vertex weights in the Maximum Weighted Independent Set Problem (MWISP)), and Pick-and-Sort, which iteratively selects and sorts representative variables from each cluster to balance local diversity with heuristic guidance. Later on, developing some theoretical results on the growth of the size of DDs for MWISP we propose two different policies for setting the number of clusters within the proposed framework. We embed these strategies into a DD-based branch-and-bound algorithm and evaluate them on the MWISP. Across benchmark instances, the proposed methodology consistently reduces computational costs compared to standard dynamic variable ordering baseline.
LGOct 14, 2025
Multi-Action Self-Improvement for Neural Combinatorial OptimizationLaurin Luttmann, Lin Xie
Self-improvement has emerged as a state-of-the-art paradigm in Neural Combinatorial Optimization (NCO), where models iteratively refine their policies by generating and imitating high-quality solutions. Despite strong empirical performance, existing methods face key limitations. Training is computationally expensive, as policy updates require sampling numerous candidate solutions per instance to extract a single expert trajectory. More fundamentally, these approaches fail to exploit the structure of combinatorial problems involving the coordination of multiple agents, such as vehicles in min-max routing or machines in scheduling. By supervising on single-action trajectories, they fail to exploit agent-permutation symmetries, where distinct sequences of actions yield identical solutions, hindering generalization and the ability to learn coordinated behavior. We address these challenges by extending self-improvement to operate over joint multi-agent actions. Our model architecture predicts complete agent-task assignments jointly at each decision step. To explicitly leverage symmetries, we employ a set-prediction loss, which supervises the policy on multiple expert assignments for any given state. This approach enhances sample efficiency and the model's ability to learn coordinated behavior. Furthermore, by generating multi-agent actions in parallel, it drastically accelerates the solution generation phase of the self-improvement loop. Empirically, we validate our method on several combinatorial problems, demonstrating consistent improvements in the quality of the final solution and a reduced generation latency compared to standard self-improvement.
ROJun 3, 2025
Solving the Pod Repositioning Problem with Deep Reinforced Adaptive Large Neighborhood SearchLin Xie, Hanyi Li
The Pod Repositioning Problem (PRP) in Robotic Mobile Fulfillment Systems (RMFS) involves selecting optimal storage locations for pods returning from pick stations. This work presents an improved solution method that integrates Adaptive Large Neighborhood Search (ALNS) with Deep Reinforcement Learning (DRL). A DRL agent dynamically selects destroy and repair operators and adjusts key parameters such as destruction degree and acceptance thresholds during the search. Specialized heuristics for both operators are designed to reflect PRP-specific characteristics, including pod usage frequency and movement costs. Computational results show that this DRL-guided ALNS outperforms traditional approaches such as cheapest-place, fixed-place, binary integer programming, and static heuristics. The method demonstrates strong solution quality and illustrating the benefit of learning-driven control within combinatorial optimization for warehouse systems.
MMMar 5, 2021
Extend the FFmpeg Framework to Analyze Media ContentXintian Wu, Pengfei Qu, Shaofei Wang et al.
This paper introduces a new set of video analytics plugins developed for the FFmpeg framework. Multimedia applications that increasingly utilize the FFmpeg media features for its comprehensive media encoding, decoding, muxing, and demuxing capabilities can now additionally analyze the video content based on AI models. The plugins are thread optimized for best performance overcoming certain FFmpeg threading limitations. The plugins utilize the Intel OpenVINO Toolkit inference engine as the backend. The analytics workloads are accelerated on different platforms such as CPU, GPU, FPGA or specialized analytics accelerators. With our reference implementation, the feature of OpenVINO as inference backend has been pushed into FFmpeg mainstream repository. We plan to submit more patches later.
OCJan 27, 2021
Formulating and solving integrated order batching and routing in multi-depot AGV-assisted mixed-shelves warehousesLin Xie, Hanyi Li, Laurin Luttmann
Different retail and e-commerce companies are facing the challenge of assembling large numbers of time-critical picking orders that include both small-line and multi-line orders. To reduce unproductive picker working time as in traditional picker-to-parts warehousing systems, different solutions are proposed in the literature and in practice. For example, in a mixed-shelves storage policy, items of the same stock keeping unit are spread over several shelves in a warehouse; or automated guided vehicles (AGVs) are used to transport the picked items from the storage area to packing stations instead of human pickers. This is the first paper to combine both solutions, creating what we call AGV-assisted mixed-shelves picking systems. We model the new integrated order batching and routing problem in such systems as an extended multi-depot vehicle routing problem with both three-index and two-commodity network flow formulations. Due to the complexity of the integrated problem, we develop a novel variable neighborhood search algorithm to solve the integrated problem more efficiently. We test our methods with different sizes of instances, and conclude that the mixed-shelves storage policy is more suitable than the usual storage policy in AGV-assisted mixed-shelves systems for orders with different sizes of order lines (saving up to 62% on driving distances for AGVs). Our variable neighborhood search algorithm provides optimal solutions within an acceptable computational time.
LGAug 28, 2020
An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big DataGenggeng Liu, Canyang Guo, Lin Xie et al.
In the era of big data, a large number of text data generated by the Internet has given birth to a variety of text representation methods. In natural language processing (NLP), text representation transforms text into vectors that can be processed by computer without losing the original semantic information. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in language. Therefore, a text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed to extract the text features and apply the obtained text feature representation on the text classification tasks. CNN is used to extract the features of text vector to get the semantics among words and VAE is introduced to make the text feature space more consistent with Gaussian distribution. In addition, the output of the improved word2vec model is employed as the input of the proposed model to distinguish different meanings of the same word in different contexts. The experimental results show that the proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
AIOct 9, 2018
Deterministic Pod Repositioning Problem in Robotic Mobile Fulfillment SystemsRuslan Krenzler, Lin Xie, Hanyi Li
In a robotic mobile fulfillment system, robots bring shelves, called pods, with storage items from the storage area to pick stations. At every pick station there is a person -- the picker -- who takes parts from the pod and packs them into boxes according to orders. Usually there are multiple shelves at the pick station. In this case, they build a queue with the picker at its head. When the picker does not need the pod any more, a robot transports the pod back to the storage area. At that time, we need to answer a question: "Where is the optimal place in the inventory to put this pod back?". It is a tough question, because there are many uncertainties to consider before answering it. Moreover, each decision made to answer the question influences the subsequent ones. The goal of this paper is to answer the question properly. We call this problem the Pod Repositioning Problem and formulate a deterministic model. This model is tested with different algorithms, including binary integer programming, cheapest place, fixed place, random place, genetic algorithms, and a novel algorithm called tetris.
MAOct 12, 2017
RAWSim-O: A Simulation Framework for Robotic Mobile Fulfillment SystemsMarius Merschformann, Lin Xie, Hanyi Li
This paper deals with a new type of warehousing system, Robotic Mobile Fulfillment Systems (RMFS). In such systems, robots are sent to carry storage units, so-called "pods", from the inventory and bring them to human operators working at stations. At the stations, the items are picked according to customers' orders. There exist new decision problems in such systems, for example, the reallocation of pods after their visits at work stations or the selection of pods to fulfill orders. In order to analyze decision strategies for these decision problems and relations between them, we develop a simulation framework called "RAWSim-O" in this paper. Moreover, we show a real-world application of our simulation framework by integrating simple robot prototypes based on vacuum cleaning robots.
AIJun 28, 2017
Path planning for Robotic Mobile Fulfillment SystemsMarius Merschformann, Lin Xie, Daniel Erdmann
This paper presents a collection of path planning algorithms for real-time movement of multiple robots across a Robotic Mobile Fulfillment System (RMFS). Robots are assigned to move storage units to pickers at working stations instead of requiring pickers to go to the storage area. Path planning algorithms aim to find paths for the robots to fulfill the requests without collisions or deadlocks. The state-of-the-art path planning algorithms, including WHCA*, FAR, BCP, OD&ID and CBS, were adapted to suit path planning in RMFS and integrated within a simulation tool to guide the robots from their starting points to their destinations during the storage and retrieval processes. Ten different layouts with a variety of numbers of robots, floors, pods, stations and the sizes of storage areas were considered in the simulation study. Performance metrics of throughput, path length and search time were monitored. Simulation results demonstrate the best algorithm based on each performance metric.