Oleg Bulichev

AI
h-index7
3papers
3citations
Novelty52%
AI Score43

3 Papers

CVMay 6, 2025Code
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes

Sergey Linok, Vadim Semenov, Anastasia Trunova et al.

The analysis of events in dynamic environments poses a fundamental challenge in the development of intelligent agents and robots capable of interacting with humans. Current approaches predominantly utilize visual models. However, these methods often capture information implicitly from images, lacking interpretable spatial-temporal object representations. To address this issue we introduce DyGEnc - a novel method for Encoding a Dynamic Graph. This method integrates compressed spatial-temporal structural observation representation with the cognitive capabilities of large language models. The purpose of this integration is to enable advanced question answering based on a sequence of textual scene graphs. Extended evaluations on the STAR and AGQA datasets indicate that DyGEnc outperforms existing visual methods by a large margin of 15-25% in addressing queries regarding the history of human-to-object interactions. Furthermore, the proposed method can be seamlessly extended to process raw input images utilizing foundational models for extracting explicit textual scene graphs, as substantiated by the results of a robotic experiment conducted with a wheeled manipulator platform. We hope that these findings will contribute to the implementation of robust and compressed graph-based robotic memory for long-horizon reasoning. Code is available at github.com/linukc/DyGEnc.

51.9AIApr 7
MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

Maria Nesterova, Mikhail Kolosov, Anton Andreychuk et al.

Recent advances in multi-agent reinforcement learning (MARL) have demonstrated success in numerous challenging domains and environments, but typically require specialized models for each task. In this work, we propose a coherent methodology that makes it possible for a single GPT-based model to learn and perform well across diverse MARL environments and tasks, including StarCraft Multi-Agent Challenge, Google Research Football and POGEMA. Our method, MARL-GPT, applies offline reinforcement learning to train at scale on the expert trajectories (400M for SMACv2, 100M for GRF, and 1B for POGEMA) combined with a single transformer-based observation encoder that requires no task-specific tuning. Experiments show that MARL-GPT achieves competitive performance compared to specialized baselines in all tested environments. Thus, our findings suggest that it is, indeed, possible to build a multi-task transformer-based model for a wide variety of (significantly different) multi-agent problems paving the way to the fundamental MARL model (akin to ChatGPT, Llama, Mistral etc. in natural language modeling).

ROJun 18, 2025
PRISM-Loc: a Lightweight Long-range LiDAR Localization in Urban Environments with Topological Maps

Kirill Muravyev, Vasily Yuryev, Oleg Bulichev et al.

Localization in the environment is one of the crucial tasks of navigation of a mobile robot or a self-driving vehicle. For long-range routes, performing localization within a dense global lidar map in real time may be difficult, and the creation of such a map may require much memory. To this end, leveraging topological maps may be useful. In this work, we propose PRISM-Loc -- a topological map-based approach for localization in large environments. The proposed approach leverages a twofold localization pipeline, which consists of global place recognition and estimation of the local pose inside the found location. For local pose estimation, we introduce an original lidar scan matching algorithm, which is based on 2D features and point-based optimization. We evaluate the proposed method on the ITLP-Campus dataset on a 3 km route, and compare it against the state-of-the-art metric map-based and place recognition-based competitors. The results of the experiments show that the proposed method outperforms its competitors both quality-wise and computationally-wise.