Aleksey Fedoseev

RO
h-index24
17papers
116citations
Novelty53%
AI Score47

17 Papers

ROSep 16, 2024Code
Industry 6.0: New Generation of Industry driven by Generative AI and Swarm of Heterogeneous Robots

Artem Lykov, Miguel Altamirano Cabrera, Mikhail Konenkov et al.

This paper presents the concept of Industry 6.0, introducing the world's first fully automated production system that autonomously handles the entire product design and manufacturing process based on user-provided natural language descriptions. By leveraging generative AI, the system automates critical aspects of production, including product blueprint design, component manufacturing, logistics, and assembly. A heterogeneous swarm of robots, each equipped with individual AI through integration with Large Language Models (LLMs), orchestrates the production process. The robotic system includes manipulator arms, delivery drones, and 3D printers capable of generating assembly blueprints. The system was evaluated using commercial and open-source LLMs, functioning through APIs and local deployment. A user study demonstrated that the system reduces the average production time to 119.10 minutes, significantly outperforming a team of expert human developers, who averaged 528.64 minutes (an improvement factor of 4.4). Furthermore, in the product blueprinting stage, the system surpassed human CAD operators by an unprecedented factor of 47, completing the task in 0.5 minutes compared to 23.5 minutes. This breakthrough represents a major leap towards fully autonomous manufacturing.

ROMar 10
ImpedanceDiffusion: Diffusion-Based Global Path Planning for UAV Swarm Navigation with Generative Impedance Control

Faryal Batool, Yasheerah Yaqoot, Muhammad Ahsan Mustafa et al.

Safe swarm navigation in cluttered indoor environment requires long-horizon planning, reactive obstacle avoidance, and adaptive compliance. We propose ImpedanceDiffusion, a hierarchical framework that leverages image-conditioned diffusion-based global path planning with Artificial Potential Field (APF) tracking and semantic-aware variable impedance control for aerial drone swarms. The diffusion model generates geometric global trajectories directly from RGB images without explicit map construction. These trajectories are tracked by an APF-based reactive layer, while a VLM-RAG module performs semantic obstacle classification with 90% retrieval accuracy to adapt impedance parameters for mixed obstacle environments during execution. Two diffusion planners are evaluated: (i) a top-view long-horizon planner using single-pass inference and (ii) a first-person-view (FPV) short-horizon planner deployed via a two-stage inference pipeline. Both planners achieve a 100% trajectory generation rate across twenty static and dynamic experimental configurations and are validated via zero-shot sim-to-real deployment on Crazyflie 2.1 drones through the hierarchical APF-impedance control stack. The top-view planner produces smoother trajectories that yield conservative tracking speeds of 1.0-1.2 m/s near hard obstacles and 0.6-1.0 m/s near soft obstacles. In contrast, the FPV planner generates trajectories with greater local clearance and typically higher speeds, reaching 1.4-2.0 m/s near hard obstacles and up to 1.6 m/s near soft obstacles. Across 20 experimental configurations (100 total runs), the framework achieved a 92% success rate while maintaining stable impedance-based formation control with bounded oscillations and no in-flight collisions, demonstrating reliable and adaptive swarm navigation in cluttered indoor environments.

ROSep 24, 2024
TiltXter: CNN-based Electro-tactile Rendering of Tilt Angle for Telemanipulation of Pasteur Pipettes

Miguel Altamirano Cabrera, Jonathan Tirado, Aleksey Fedoseev et al.

The shape of deformable objects can change drastically during grasping by robotic grippers, causing an ambiguous perception of their alignment and hence resulting in errors in robot positioning and telemanipulation. Rendering clear tactile patterns is fundamental to increasing users' precision and dexterity through tactile haptic feedback during telemanipulation. Therefore, different methods have to be studied to decode the sensors' data into haptic stimuli. This work presents a telemanipulation system for plastic pipettes that consists of a Force Dimension Omega.7 haptic interface endowed with two electro-stimulation arrays and two tactile sensor arrays embedded in the 2-finger Robotiq gripper. We propose a novel approach based on convolutional neural networks (CNN) to detect the tilt of deformable objects. The CNN generates a tactile pattern based on recognized tilt data to render further electro-tactile stimuli provided to the user during the telemanipulation. The study has shown that using the CNN algorithm, tilt recognition by users increased from 23.13\% with the downsized data to 57.9%, and the success rate during teleoperation increased from 53.12% using the downsized data to 92.18% using the tactile patterns generated by the CNN.

ROMar 18
GoalVLM: VLM-driven Object Goal Navigation for Multi-Agent System

MoniJesu James, Amir Atef Habel, Aleksey Fedoseev et al.

Object-goal navigation has traditionally been limited to ground robots with closed-set object vocabularies. Existing multi-agent approaches depend on precomputed probabilistic graphs tied to fixed category sets, precluding generalization to novel goals at test time. We present GoalVLM, a cooperative multi-agent framework for zero-shot, open-vocabulary object navigation. GoalVLM integrates a Vision-Language Model (VLM) directly into the decision loop, SAM3 for text-prompted detection and segmentation, and SpaceOM for spatial reasoning, enabling agents to interpret free-form language goals and score frontiers via zero-shot semantic priors without retraining. Each agent builds a BEV semantic map from depth-projected voxel splatting, while a Goal Projector back-projects detections through calibrated depth into the map for reliable goal localization. A constraint-guided reasoning layer evaluates frontiers through a structured prompt chain (scene captioning, room-type classification, perception gating, multi-frontier ranking), injecting commonsense priors into exploration. We evaluate GoalVLM on GOAT-Bench val_unseen (360 multi-subtask episodes, 1032 sequential object-goal subtasks, HM3D scenes), where each episode requires navigating to a chain of 5-7 open-vocabulary targets. GoalVLM with N=2 agents achieves 55.8% subtask SR and 18.3% SPL, competitive with state-of-the-art methods while requiring no task-specific training. Ablation studies confirm the contributions of VLM-guided frontier reasoning and depth-projected goal localization.

ROMar 16
GoalSwarm: Multi-UAV Semantic Coordination for Open-Vocabulary Object Navigation

MoniJesu Wonders James, Amir Atef Habel, Aleksey Fedoseev et al.

Cooperative visual semantic navigation is a foundational capability for aerial robot teams operating in unknown environments. However, achieving robust open-vocabulary object-goal navigation remains challenging due to the computational constraints of deploying heavy perception models onboard and the complexity of decentralized multi-agent coordination. We present GoalSwarm, a fully decentralized multi-UAV framework for zero-shot semantic object-goal navigation. Each UAV collaboratively constructs a shared, lightweight 2D top-down semantic occupancy map by projecting depth observations from aerial vantage points, eliminating the computational burden of full 3D representations while preserving essential geometric and semantic structure. The core contributions of GoalSwarm are threefold: (1) integration of zero-shot foundation model -- SAM3 for open vocabulary detection and pixel-level segmentation, enabling open-vocabulary target identification without task-specific training; (2) a Bayesian Value Map that fuses multi-viewpoint detection confidences into a per-pixel goal-relevance distribution, enabling informed frontier scoring via Upper Confidence Bound (UCB) exploration; and (3) a decentralized coordination strategy combining semantic frontier extraction, cost-utility bidding with geodesic path costs, and spatial separation penalties to minimize redundant exploration across the swarm.

ROFeb 4, 2025
MAGNNET: Multi-Agent Graph Neural Network-based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning

Lavanya Ratnabala, Aleksey Fedoseev, Robinroy Peter et al.

This paper addresses the challenge of decentralized task allocation within heterogeneous multi-agent systems operating under communication constraints. We introduce a novel framework that integrates graph neural networks (GNNs) with a centralized training and decentralized execution (CTDE) paradigm, further enhanced by a tailored Proximal Policy Optimization (PPO) algorithm for multi-agent deep reinforcement learning (MARL). Our approach enables unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to dynamically allocate tasks efficiently without necessitating central coordination in a 3D grid environment. The framework minimizes total travel time while simultaneously avoiding conflicts in task assignments. For the cost calculation and routing, we employ reservation-based A* and R* path planners. Experimental results revealed that our method achieves a high 92.5% conflict-free success rate, with only a 7.49% performance gap compared to the centralized Hungarian method, while outperforming the heuristic decentralized baseline based on greedy approach. Additionally, the framework exhibits scalability with up to 20 agents with allocation processing of 2.8 s and robustness in responding to dynamically generated tasks, underscoring its potential for real-world applications in complex multi-agent scenarios.

ROOct 24, 2021
GraspLook: a VR-based Telemanipulation System with R-CNN-driven Augmentation of Virtual Environment

Polina Ponomareva, Daria Trinitatova, Aleksey Fedoseev et al.

The teleoperation of robotic systems in medical applications requires stable and convenient visual feedback for the operator. The most accessible approach to delivering visual information from the remote area is using cameras to transmit a video stream from the environment. However, such systems are sensitive to the camera resolution, limited viewpoints, and cluttered environment bringing additional mental demands to the human operator. The paper proposes a novel system of teleoperation based on an augmented virtual environment (VE). The region-based convolutional neural network (R-CNN) is applied to detect the laboratory instrument and estimate its position in the remote environment to display further its digital twin in the VE, which is necessary for dexterous telemanipulation. The experimental results revealed that the developed system allows users to operate the robot smoother, which leads to a decrease in task execution time when manipulating test tubes. In addition, the participants evaluated the developed system as less mentally demanding (by 11%) and requiring less effort (by 16%) to accomplish the task than the camera-based teleoperation approach and highly assessed their performance in the augmented VE. The proposed technology can be potentially applied for conducting laboratory tests in remote areas when operating with infectious and poisonous reagents.

HCOct 18, 2021
DroneStick: Flying Joystick as a Novel Type of Interface

Evgeny Tsykunov, Aleksey Fedoseev, Ekaterina Dorzhieva et al.

DroneStick is a novel hands-free method for smooth interaction between a human and a robotic system via one of its agents, without training and any additional handheld or wearable device or infrastructure. A flying joystick (DroneStick), being a part of a multi-robot system, is composed of a flying drone and coiled wire with a vibration motor. By pulling on the coiled wire, the operator commands certain motions of the follower robotic system. The DroneStick system does not require the user to carry any equipment before or after performing the required interaction. DroneStick provides useful feedback to the operator in the form of force transferred through the wire, translation/rotation of the flying joystick, and motor vibrations at the fingertips. Feedback allows users to interact with different forms of robotic systems intuitively. A potential application can enhance an automated `last mile' delivery when a recipient needs to guide a delivery drone/robot gently to a spot where a parcel has to be dropped.

HCAug 3, 2021
SwarmPlay: Interactive Tic-tac-toe Board Game with Swarm of Nano-UAVs driven by Reinforcement Learning

Ekaterina Karmanova, Valerii Serpiva, Stepan Perminov et al.

Reinforcement learning (RL) methods have been actively applied in the field of robotics, allowing the system itself to find a solution for a task otherwise requiring a complex decision-making algorithm. In this paper, we present a novel RL-based Tic-tac-toe scenario, i.e. SwarmPlay, where each playing component is presented by an individual drone that has its own mobility and swarm intelligence to win against a human player. Thus, the combination of challenging swarm strategy and human-drone collaboration aims to make the games with machines tangible and interactive. Although some research on AI for board games already exists, e.g., chess, the SwarmPlay technology has the potential to offer much more engagement and interaction with the user as it proposes a multi-agent swarm instead of a single interactive robot. We explore user's evaluation of RL-based swarm behavior in comparison with the game theory-based behavior. The preliminary user study revealed that participants were highly engaged in the game with drones (70% put a maximum score on the Likert scale) and found it less artificial compared to the regular computer-based systems (80%). The affection of the user's game perception from its outcome was analyzed and put under discussion. User study revealed that SwarmPlay has the potential to be implemented in a wider range of games, significantly improving human-drone interactivity.

HCAug 1, 2021
SwarmPlay: A Swarm of Nano-Quadcopters Playing Tic-tac-toe Board Game against a Human

Ekaterina Karmanova, Valerii Serpiva, Stepan Perminov et al.

We present a new paradigm of games, i.e. SwarmPlay, where each playing component is presented by an individual drone that has its own mobility and swarm intelligence to win against a human player. The motivation behind the research is to make the games with machines tangible and interactive. Although some research on the robotic players for board games already exists, e.g., chess, the SwarmPlay technology has the potential to offer much more engagement and interaction with a human as it proposes a multi-agent swarm instead of a single interactive robot. The proposed system consists of a robotic swarm, a workstation, a computer vision (CV), and Game Theory-based algorithms. A novel game algorithm was developed to provide a natural game experience to the user. The preliminary user study revealed that participants were highly engaged in the game with drones (69% put a maximum score on the Likert scale) and found it less artificial compared to the regular computer-based systems (77% put maximum score). The affection of the user's game perception from its outcome was analyzed and put under discussion. User study revealed that SwarmPlay has the potential to be implemented in a wider range of games, significantly improving human-drone interactivity.

ROJul 23, 2021
DronePaint: Swarm Light Painting with DNN-based Gesture Recognition

Valerii Serpiva, Ekaterina Karmanova, Aleksey Fedoseev et al.

We propose a novel human-swarm interaction system, allowing the user to directly control a swarm of drones in a complex environment through trajectory drawing with a hand gesture interface based on the DNN-based gesture recognition. The developed CV-based system allows the user to control the swarm behavior without additional devices through human gestures and motions in real-time, providing convenient tools to change the swarm's shape and formation. The two types of interaction were proposed and implemented to adjust the swarm hierarchy: trajectory drawing and free-form trajectory generation control. The experimental results revealed a high accuracy of the gesture recognition system (99.75%), allowing the user to achieve relatively high precision of the trajectory drawing (mean error of 5.6 cm in comparison to 3.1 cm by mouse drawing) over the three evaluated trajectory patterns. The proposed system can be potentially applied in complex environment exploration, spray painting using drones, and interactive drone shows, allowing users to create their own art objects by drone swarms.

ROJun 28, 2021
SwarmPaint: Human-Swarm Interaction for Trajectory Generation and Formation Control by DNN-based Gesture Interface

Valerii Serpiva, Ekaterina Karmanova, Aleksey Fedoseev et al.

Teleoperation tasks with multi-agent systems have a high potential in supporting human-swarm collaborative teams in exploration and rescue operations. However, it requires an intuitive and adaptive control approach to ensure swarm stability in a cluttered and dynamically shifting environment. We propose a novel human-swarm interaction system, allowing the user to control swarm position and formation by either direct hand motion or by trajectory drawing with a hand gesture interface based on the DNN gesture recognition. The key technology of the SwarmPaint is the user's ability to perform various tasks with the swarm without additional devices by switching between interaction modes. Two types of interaction were proposed and developed to adjust a swarm behavior: free-form trajectory generation control and shaped formation control. Two preliminary user studies were conducted to explore user's performance and subjective experience from human-swarm interaction through the developed control modes. The experimental results revealed a sufficient accuracy in the trajectory tracing task (mean error of 5.6 cm by gesture draw and 3.1 cm by mouse draw with the pattern of dimension 1 m by 1 m) over three evaluated trajectory patterns and up to 7.3 cm accuracy in targeting task with two target patterns of 1 m achieved by SwarmPaint interface. Moreover, the participants evaluated the trajectory drawing interface as more intuitive (12.9 %) and requiring less effort to utilize (22.7%) than direct shape and position control by gestures, although its physical workload and failure in performance were presumed as more significant (by 9.1% and 16.3%, respectively).

ROFeb 7, 2021
DroneTrap: Drone Catching in Midair by Soft Robotic Hand with Color-Based Force Detection and Hand Gesture Recognition

Aleksey Fedoseev, Valerii Serpiva, Ekaterina Karmanova et al.

The paper proposes a novel concept of docking drones to make this process as safe and fast as possible. The idea behind the project is that a robot with a soft gripper grasps the drone in midair. The human operator navigates the robotic arm with the ML-based gesture recognition interface. The 3-finger robot hand with soft fingers is equipped with touch sensors, making it possible to achieve safe drone catching and avoid inadvertent damage to the drone's propellers and motors. Additionally, the soft hand is featured with a unique color-based force estimation technology based on a computer vision (CV) system. Moreover, the visual color-changing system makes it easier for the human operator to interpret the applied forces. Without any additional programming, the operator has full real-time control of the robot's motion and task execution by wearing a mocap glove with gesture recognition, which was developed and applied for the high-level control of DroneTrap. The experimental results revealed that the developed color-based force estimation can be applied for rigid object capturing with high precision (95.3\%). The proposed technology can potentially revolutionize the landing and deployment of drones for parcel delivery on uneven ground, structure maintenance and inspection, risque operations, and etc.

RONov 7, 2020
MaskBot: Real-time Robotic Projection Mapping with Head Motion Tracking

Miguel Altamirano-Cabrera, Igor Usachev, Juan Heredia et al.

The projection mapping systems on the human face is limited by the latency and the movement of the users. The area of the projection is restricted by the position of the projectors and the cameras. We are introducing MaskBot, a real-time projection mapping system operated by a 6 Degrees of Freedom (DoF) collaborative robot. The collaborative robot locates the projector and camera in normal position to the face of the user to increase the projection area and to reduce the latency of the system. A webcam is used to detect the face and to sense the robot-user distance to modify the projection size and orientation. MaskBot projects different images on the face of the user, such as face modifications, make-up, and logos. In contrast to the existing methods, the presented system is the first that introduces a robotic projection mapping. One of the prospective applications is to acquire a dataset of adversarial images to challenge face detection DNN systems, such as Face ID.

HCJun 23, 2020
TeslaMirror: Multistimulus Encounter-Type Haptic Display for Shape and Texture Rendering in VR

Aleksey Fedoseev, Akerke Tleugazy, Luiza Labazanova et al.

This paper proposes a novel concept of a hybrid tactile display with multistimulus feedback, allowing the real-time experience of the position, shape, and texture of the virtual object. The key technology of the TeslaMirror is that we can deliver the sensation of object parameters (pressure, vibration, and electrotactile feedback) without any wearable haptic devices. We developed the full digital twin of the 6 DOF UR robot in the virtual reality (VR) environment, allowing the adaptive surface simulation and control of the hybrid display in real-time. The preliminary user study was conducted to evaluate the ability of TeslaMirror to reproduce shape sensations with the under-actuated end-effector. The results revealed that potentially this approach can be used in the virtual systems for rendering versatile VR shapes with high fidelity haptic experience.

ROApr 1, 2020
Coupling of localization and depth data for mapping using Intel RealSense T265 and D435i cameras

Evgeny Tsykunov, Valery Ilin, Stepan Perminov et al.

We propose to couple two types of Intel RealSense sensors (tracking T265 and depth D435i) in order to obtain localization and 3D occupancy map of the indoor environment. We implemented a python-based observer pattern with multi-threaded approach for camera data synchronization. We compared different point cloud (PC) alignment methods (using transformations obtained from tracking camera and from ICP family methods). Tracking camera and PC alignment allow us to generate a set of transformations between frames. Based on these transformations we obtained different trajectories and provided their analysis. Finally, having poses for all frames, we combined depth data. Firstly we obtained a joint PC representing the whole scene. Then we used Octomap representation to build a map.

RONov 18, 2019
Development of MirrorShape: High Fidelity Large-Scale Shape Rendering Framework for Virtual Reality

Aleksey Fedoseev, Nikita Chernyadev, Dzmitry Tsetserukou

Today there is a high variety of haptic devices capable of providing tactile feedback. Although most of existing designs are aimed at realistic simulation of the surface properties, their capabilities are limited in attempts of displaying shape and position of virtual objects. This paper suggests a new concept of distributed haptic display for realistic interaction with virtual object of complex shape by a collaborative robot with shape display end-effector. MirrorShape renders the 3D object in virtual reality (VR) system by contacting the user hands with the robot end-effector at the calculated point in real-time. Our proposed system makes it possible to synchronously merge the position of contact point in VR and end-effector in real world. This feature provides presentation of different shapes, and at the same time expands the working area comparing to desktop solutions. The preliminary user study revealed that MirrorShape was effective at reducing positional error in VR interactions. Potentially this approach can be used in the virtual systems for rendering versatile VR objects with wide range of sizes with high fidelity large-scaleshape experience.