Anton Andreychuk

h-index9

24papers

416citations

Novelty45%

AI Score54

Ranked #10,971 of 194,257 authors (top 6%)#414 in AI (top 3%)

24 Papers

8.7LGJun 22, 2022Code

POGEMA: Partially Observable Grid Environment for Multiple Agents

Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev et al.

We introduce POGEMA (https://github.com/AIRI-Institute/pogema) a sandbox for challenging partially observable multi-agent pathfinding (PO-MAPF) problems . This is a grid-based environment that was specifically designed to be a flexible, tunable and scalable benchmark. It can be tailored to a variety of PO-MAPF, which can serve as an excellent testing ground for planning and learning methods, and their combination, which will allow us to move towards filling the gap between AI planning and learning.

21.0AIOct 2, 2023

Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning

Alexey Skrynnik, Anton Andreychuk, Maria Nesterova et al.

Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that posses all the information on the agents' locations and goals is absent and the agents have to sequientially decide the actions on their own without having access to a full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-shaping or external guidance. We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rule-based one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver.

17.9AIDec 22, 2022Code

TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers

Daniil Kirilenko, Anton Andreychuk, Aleksandr Panov et al.

Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, when learning the correction factor the knowledge of the instance-independent heuristic is utilized. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be utilized in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of $4$x while producing the solutions, which costs exceed the costs of the optimal solutions by less than $0.3$% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners.

10.1AIApr 7Code

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

Maria Nesterova, Mikhail Kolosov, Anton Andreychuk et al.

Recent advances in multi-agent reinforcement learning (MARL) have demonstrated success in numerous challenging domains and environments, but typically require specialized models for each task. In this work, we propose a coherent methodology that makes it possible for a single GPT-based model to learn and perform well across diverse MARL environments and tasks, including StarCraft Multi-Agent Challenge, Google Research Football and POGEMA. Our method, MARL-GPT, applies offline reinforcement learning to train at scale on the expert trajectories (400M for SMACv2, 100M for GRF, and 1B for POGEMA) combined with a single transformer-based observation encoder that requires no task-specific tuning. Experiments show that MARL-GPT achieves competitive performance compared to specialized baselines in all tested environments. Thus, our findings suggest that it is, indeed, possible to build a multi-task transformer-based model for a wide variety of (significantly different) multi-agent problems paving the way to the fundamental MARL model (akin to ChatGPT, Llama, Mistral etc. in natural language modeling).

11.7AIJul 25, 2023

Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results

Yelisey Pitanov, Alexey Skrynnik, Anton Andreychuk et al.

In this work we study a well-known and challenging problem of Multi-agent Pathfinding, when a set of agents is confined to a graph, each agent is assigned a unique start and goal vertices and the task is to find a set of collision-free paths (one for each agent) such that each agent reaches its respective goal. We investigate how to utilize Monte-Carlo Tree Search (MCTS) to solve the problem. Although MCTS was shown to demonstrate superior performance in a wide range of problems like playing antagonistic games (e.g. Go, Chess etc.), discovering faster matrix multiplication algorithms etc., its application to the problem at hand was not well studied before. To this end we introduce an original variant of MCTS, tailored to multi-agent pathfinding. The crux of our approach is how the reward, that guides MCTS, is computed. Specifically, we use individual paths to assist the agents with the the goal-reaching behavior, while leaving them freedom to get off the track if it is needed to avoid collisions. We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure. Empirically we show that the suggested method outperforms the baseline planning algorithm that invokes heuristic search, e.g. A*, at each re-planning step.

12.2MAAug 29, 2024Code

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov et al.

Multi-agent pathfinding (MAPF) is a problem that generally requires finding collision-free paths for multiple agents in a shared environment. Solving MAPF optimally, even under restrictive assumptions, is NP-hard, yet efficient solutions for this problem are critical for numerous applications, such as automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Typically, such learning-based MAPF solvers are augmented with additional components like single-agent planning or communication. Orthogonally, in this work we rely solely on imitation learning that leverages a large dataset of expert MAPF solutions and transformer-based neural network to create a foundation model for MAPF called MAPF-GPT. The latter is capable of generating actions without additional heuristics or communication. MAPF-GPT demonstrates zero-shot learning abilities when solving the MAPF problems that are not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable MAPF solvers on a diverse range of problem instances and is computationally efficient during inference.

20.3LGJul 20, 2024Code

POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding

Alexey Skrynnik, Anton Andreychuk, Anatolii Borzilov et al.

Multi-agent reinforcement learning (MARL) has recently excelled in solving challenging cooperative and competitive multi-agent problems in various environments, typically involving a small number of agents and full observability. Moreover, a range of crucial robotics-related tasks, such as multi-robot pathfinding, which have traditionally been approached with classical non-learnable methods (e.g., heuristic search), are now being suggested for solution using learning-based or hybrid methods. However, in this domain, it remains difficult, if not impossible, to conduct a fair comparison between classical, learning-based, and hybrid approaches due to the lack of a unified framework that supports both learning and evaluation. To address this, we introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, a collection of predefined problem instances, a visualization toolkit, and a benchmarking tool for automated evaluation. We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators (such as success rate and path length), enabling a fair multi-fold comparison. The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.

6.2AISep 20, 2022

Analysis Of The Anytime MAPF Solvers Based On The Combination Of Conflict-Based Search (CBS) and Focal Search (FS)

Ilya Ivanashev, Anton Andreychuk, Konstantin Yakovlev

Conflict-Based Search (CBS) is a widely used algorithm for solving multi-agent pathfinding (MAPF) problems optimally. The core idea of CBS is to run hierarchical search, when, on the high level the tree of solutions candidates is explored, and on the low-level an individual planning for a specific agent (subject to certain constraints) is carried out. To trade-off optimality for running time different variants of bounded sub-optimal CBS were designed, which alter both high- and low-level search routines of CBS. Moreover, anytime variant of CBS does exist that applies Focal Search (FS) to the high-level of CBS - Anytime BCBS. However, no comprehensive analysis of how well this algorithm performs compared to the naive one, when we simply re-invoke CBS with the decreased sub-optimality bound, was present. This work aims at filling this gap. Moreover, we present and evaluate another anytime version of CBS that uses FS on both levels of CBS. Empirically, we show that its behavior is principally different from the one demonstrated by Anytime BCBS. Finally, we compare both algorithms head-to-head and show that using Focal Search on both levels of CBS can be beneficial in a wide range of setups.

12.5AIDec 26, 2023Code

Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding

Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev et al.

The Multi-Agent Pathfinding (MAPF) problem involves finding a set of conflict-free paths for a group of agents confined to a graph. In typical MAPF scenarios, the graph and the agents' starting and ending vertices are known beforehand, allowing the use of centralized planning algorithms. However, in this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally and are restricted in communications with each other. Specifically, we investigate the lifelong variant of MAPF, where new goals are continually assigned to the agents upon completion of previous ones. Drawing inspiration from the successful AlphaZero approach, we propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks. Our approach utilizes the agent's observations to recreate the intrinsic Markov decision process, which is then used for planning with a tailored for multi-agent tasks version of neural MCTS. The experimental results show that our approach outperforms state-of-the-art learnable MAPF solvers. The source code is available at https://github.com/AIRI-Institute/mats-lp.

8.2AIMay 8

Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding

Valeriy Vyaltsev, Alsu Sagirova, Anton Andreychuk et al.

Multi-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solvers are critical for real-world applications such as logistics and search-and-rescue. To this end, the research community has proposed various decentralized suboptimal MAPF solvers that leverage machine learning. Such methods frame MAPF (from a single agent perspective) as a Dec-POMDP where at each time step an agent has to decide an action based on the local observation and typically solve the problem via reinforcement learning or imitation learning. We follow the same approach but additionally introduce a learnable communication module tailored to enhance cooperation between agents via efficient feature sharing. We present the Local Communication for Multi-agent Pathfinding (LC-MAPF), a generalizable pre-trained model that applies multi-round communication between neighboring agents to exchange information and improve their coordination. Our experiments show that the introduced method outperforms the existing learning-based MAPF solvers, including IL and RL-based approaches, across diverse metrics in a diverse range of (unseen) test scenarios. Remarkably, the introduced communication mechanism does not compromise LC-MAPF's scalability, a common bottleneck for communication-based MAPF solvers.

6.6MANov 12, 2025

Enhancing PIBT via Multi-Action Operations

Egor Yukhnevich, Anton Andreychuk

PIBT is a rule-based Multi-Agent Path Finding (MAPF) solver, widely used as a low-level planner or action sampler in many state-of-the-art approaches. Its primary advantage lies in its exceptional speed, enabling action selection for thousands of agents within milliseconds by considering only the immediate next timestep. However, this short-horizon design leads to poor performance in scenarios where agents have orientation and must perform time-consuming rotation actions. In this work, we present an enhanced version of PIBT that addresses this limitation by incorporating multi-action operations. We detail the modifications introduced to improve PIBT's performance while preserving its hallmark efficiency. Furthermore, we demonstrate how our method, when combined with graph-guidance technique and large neighborhood search optimization, achieves state-of-the-art performance in the online LMAPF-T setting.

2.3AIApr 25, 2024

Optimal and Bounded Suboptimal Any-Angle Multi-agent Pathfinding

Konstantin Yakovlev, Anton Andreychuk, Roni Stern

Multi-agent pathfinding (MAPF) is the problem of finding a set of conflict-free paths for a set of agents. Typically, the agents' moves are limited to a pre-defined graph of possible locations and allowed transitions between them, e.g. a 4-neighborhood grid. We explore how to solve MAPF problems when each agent can move between any pair of possible locations as long as traversing the line segment connecting them does not lead to a collision with the obstacles. This is known as any-angle pathfinding. We present the first optimal any-angle multi-agent pathfinding algorithm. Our planner is based on the Continuous Conflict-based Search (CCBS) algorithm and an optimal any-angle variant of the Safe Interval Path Planning (TO-AA-SIPP). The straightforward combination of those, however, scales poorly since any-angle path finding induces search trees with a very large branching factor. To mitigate this, we adapt two techniques from classical MAPF to the any-angle setting, namely Disjoint Splitting and Multi-Constraints. Experimental results on different combinations of these techniques show they enable solving over 30% more problems than the vanilla combination of CCBS and TO-AA-SIPP. In addition, we present a bounded-suboptimal variant of our algorithm, that enables trading runtime for solution cost in a controlled manner.

12.4AIJun 30, 2025Code

Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning

Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov et al.

Multi-agent pathfinding (MAPF) is a common abstraction of multi-robot trajectory planning problems, where multiple homogeneous robots simultaneously move in the shared environment. While solving MAPF optimally has been proven to be NP-hard, scalable, and efficient, solvers are vital for real-world applications like logistics, search-and-rescue, etc. To this end, decentralized suboptimal MAPF solvers that leverage machine learning have come on stage. Building on the success of the recently introduced MAPF-GPT, a pure imitation learning solver, we introduce MAPF-GPT-DDG. This novel approach effectively fine-tunes the pre-trained MAPF model using centralized expert data. Leveraging a novel delta-data generation mechanism, MAPF-GPT-DDG accelerates training while significantly improving performance at test time. Our experiments demonstrate that MAPF-GPT-DDG surpasses all existing learning-based MAPF solvers, including the original MAPF-GPT, regarding solution quality across many testing scenarios. Remarkably, it can work with MAPF instances involving up to 1 million agents in a single environment, setting a new milestone for scalability in MAPF domains.

7.6AIApr 14, 2021

Towards Time-Optimal Any-Angle Path Planning With Dynamic Obstacles

Konstantin Yakovlev, Anton Andreychuk

Path finding is a well-studied problem in AI, which is often framed as graph search. Any-angle path finding is a technique that augments the initial graph with additional edges to build shorter paths to the goal. Indeed, optimal algorithms for any-angle path finding in static environments exist. However, when dynamic obstacles are present and time is the objective to be minimized, these algorithms can no longer guarantee optimality. In this work, we elaborate on why this is the case and what techniques can be used to solve the problem optimally. We present two algorithms, grounded in the same idea, that can obtain provably optimal solutions to the considered problem. One of them is a naive algorithm and the other one is much more involved. We conduct a thorough empirical evaluation showing that, in certain setups, the latter algorithm might be as fast as the previously-known greedy non-optimal solver while providing solutions of better quality. In some (rare) cases, the difference in cost is up to 76%, while on average it is lower than one percent (the same cost difference is typically observed between optimal and greedy any-angle solvers in static environments).

16.0AIJan 24, 2021Code

Improving Continuous-time Conflict Based Search

Anton Andreychuk, Konstantin Yakovlev, Eli Boyarski et al.

Conflict-Based Search (CBS) is a powerful algorithmic framework for optimally solving classical multi-agent path finding (MAPF) problems, where time is discretized into the time steps. Continuous-time CBS (CCBS) is a recently proposed version of CBS that guarantees optimal solutions without the need to discretize time. However, the scalability of CCBS is limited because it does not include any known improvements of CBS. In this paper, we begin to close this gap and explore how to adapt successful CBS improvements, namely, prioritizing conflicts (PC), disjoint splitting (DS), and high-level heuristics, to the continuous time setting of CCBS. These adaptions are not trivial, and require careful handling of different types of constraints, applying a generalized version of the Safe interval path planning (SIPP) algorithm, and extending the notion of cardinal conflicts. We evaluate the effect of the suggested enhancements by running experiments both on general graphs and $2^k$-neighborhood grids. CCBS with these improvements significantly outperforms vanilla CCBS, solving problems with almost twice as many agents in some cases and pushing the limits of multiagent path finding in continuous-time domains.

8.4AIJun 1, 2020Code

Revisiting Bounded-Suboptimal Safe Interval Path Planning

Konstantin Yakovlev, Anton Andreychuk, Roni Stern

Safe-interval path planning (SIPP) is a powerful algorithm for finding a path in the presence of dynamic obstacles. SIPP returns provably optimal solutions. However, in many practical applications of SIPP such as path planning for robots, one would like to trade-off optimality for shorter planning time. In this paper we explore different ways to build a bounded-suboptimal SIPP and discuss their pros and cons. We compare the different bounded-suboptimal versions of SIPP experimentally. While there is no universal winner, the results provide insights into when each method should be used.

1.9ROJun 17, 2019

Combining Safe Interval Path Planning and Constrained Path Following Control: Preliminary Results

Konstantin Yakovlev, Anton Andreychuk, Juliya Belinskaya et al.

We study the navigation problem for a robot moving amidst static and dynamic obstacles and rely on a hierarchical approach to solve it. First, the reference trajectory is planned by the safe interval path planning algorithm that is capable of handling any-angle translations and rotations. Second, the path following problem is treated as the constrained control problem and the original flatness-based approach is proposed to generate control. We suggest a few enhancements for the path planning algorithm aimed at finding trajectories that are more likely to be followed by a robot without collisions. Results of the conducted experimental evaluation show that the number of successfully solved navigation instances significantly increases when using the suggested techniques.

28.6AIJan 16, 2019

Multi-Agent Pathfinding with Continuous Time

Anton Andreychuk, Konstantin Yakovlev, Dor Atzmon et al.

Multi-Agent Pathfinding (MAPF) is the problem of finding paths for multiple agents such that every agent reaches its goal and the agents do not collide. Most prior work on MAPF was on grids, assumed agents' actions have uniform duration, and that time is discretized into timesteps. We propose a MAPF algorithm that does not rely on these assumptions, is complete, and provides provably optimal solutions. This algorithm is based on a novel adaptation of Safe interval path planning (SIPP), a continuous time single-agent planning algorithm, and a modified version of Conflict-based search (CBS), a state of the art multi-agent pathfinding algorithm. We analyze this algorithm, discuss its pros and cons, and evaluate it experimentally on several standard benchmarks.

1.7AIJul 5, 2018

Multi-robot Path Planning in Well-formed Infrastructures: Prioritized Planning vs. Prioritized Wait Adjustment (Preliminary Results)

Anton Andreychuk, Konstantin Yakovlev

We study the problem of planning collision-free paths for a group of homogeneous robots. We propose a novel approach for turning the paths that were planned egocentrically by the robots, e.g. without taking other robots' moves into account, into collision-free trajectories and evaluate it empirically. Suggested algorithm is much faster (up to one order of magnitude) than state-of-the-art but this comes at the price of notable drop-down of the solution cost.

1.7AIJul 2, 2018

Path Finding for the Coalition of Co-operative Agents Acting in the Environment with Destructible Obstacles

Anton Andreychuk, Konstantin Yakovlev

The problem of planning a set of paths for the coalition of robots (agents) with different capabilities is considered in the paper. Some agents can modify the environment by destructing the obstacles thus allowing the other ones to shorten their paths to the goal. As a result the mutual solution of lower cost, e.g. time to completion, may be acquired. We suggest an original procedure to identify the obstacles for further removal that can be embedded into almost any heuristic search planner (we use Theta*) and evaluate it empirically. Results of the evaluation show that time-to-complete the mission can be decreased up to 9-12 % by utilizing the proposed technique.

6.6AIMay 3, 2018

Two Techniques That Enhance the Performance of Multi-robot Prioritized Path Planning

Anton Andreychuk, Konstantin Yakovlev

We introduce and empirically evaluate two techniques aimed at enhancing the performance of multi-robot prioritized path planning. The first technique is the deterministic procedure for re-scheduling (as opposed to well-known approach based on random restarts), the second one is the heuristic procedure that modifies the search-space of the individual planner involved in the prioritized path finding.

3.1AIJul 20, 2017

Applying MAPP Algorithm for Cooperative Path Finding in Urban Environments

Anton Andreychuk, Konstantin Yakovlev

The paper considers the problem of planning a set of non-conflict trajectories for the coalition of intelligent agents (mobile robots). Two divergent approaches, e.g. centralized and decentralized, are surveyed and analyzed. Decentralized planner - MAPP is described and applied to the task of finding trajectories for dozens UAVs performing nap-of-the-earth flight in urban environments. Results of the experimental studies provide an opportunity to claim that MAPP is a highly efficient planner for solving considered types of tasks.

16.1AIMar 12, 2017

Any-Angle Pathfinding for Multiple Agents Based on SIPP Algorithm

Konstantin Yakovlev, Anton Andreychuk

The problem of finding conflict-free trajectories for multiple agents of identical circular shape, operating in shared 2D workspace, is addressed in the paper and decoupled, e.g., prioritized, approach is used to solve this problem. Agents' workspace is tessellated into the square grid on which any-angle moves are allowed, e.g. each agent can move into an arbitrary direction as long as this move follows the straight line segment whose endpoints are tied to the distinct grid elements. A novel any-angle planner based on Safe Interval Path Planning (SIPP) algorithm is proposed to find trajectories for an agent moving amidst dynamic obstacles (other agents) on a grid. This algorithm is then used as part of a prioritized multi-agent planner AA-SIPP(m). On the theoretical, side we show that AA-SIPP(m) is complete under well-defined conditions. On the experimental side, in simulation tests with up to 200 agents involved, we show that our planner finds much better solutions in terms of cost (up to 20%) compared to the planners relying on cardinal moves only.

2.5AIAug 9, 2016

Resolving Spatial-Time Conflicts In A Set Of Any-angle Or Angle-constrained Grid Paths

Konstantin Yakovlev, Anton Andreychuk

We study the multi-agent path finding problem (MAPF) for a group of agents which are allowed to move into arbitrary directions on a 2D square grid. We focus on centralized conflict resolution for independently computed plans. We propose an algorithm that eliminates conflicts by using local re-planning and introducing time offsets to the execution of paths by different agents. Experimental results show that the algorithm can find high quality conflict-free solutions at low computational cost.