ROSep 24, 2024
TiltXter: CNN-based Electro-tactile Rendering of Tilt Angle for Telemanipulation of Pasteur PipettesMiguel Altamirano Cabrera, Jonathan Tirado, Aleksey Fedoseev et al.
The shape of deformable objects can change drastically during grasping by robotic grippers, causing an ambiguous perception of their alignment and hence resulting in errors in robot positioning and telemanipulation. Rendering clear tactile patterns is fundamental to increasing users' precision and dexterity through tactile haptic feedback during telemanipulation. Therefore, different methods have to be studied to decode the sensors' data into haptic stimuli. This work presents a telemanipulation system for plastic pipettes that consists of a Force Dimension Omega.7 haptic interface endowed with two electro-stimulation arrays and two tactile sensor arrays embedded in the 2-finger Robotiq gripper. We propose a novel approach based on convolutional neural networks (CNN) to detect the tilt of deformable objects. The CNN generates a tactile pattern based on recognized tilt data to render further electro-tactile stimuli provided to the user during the telemanipulation. The study has shown that using the CNN algorithm, tilt recognition by users increased from 23.13\% with the downsized data to 57.9%, and the success rate during teleoperation increased from 53.12% using the downsized data to 92.18% using the tactile patterns generated by the CNN.
ROJan 9, 2025
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationOleg Sautenkov, Yasheerah Yaqoot, Artem Lykov et al.
The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach.
ROMar 4, 2025
RaceVLA: VLA-based Racing Drone Navigation with Human-like BehaviourValerii Serpiva, Artem Lykov, Artyom Myshlyaev et al.
RaceVLA presents an innovative approach for autonomous racing drone navigation by leveraging Visual-Language-Action (VLA) to emulate human-like behavior. This research explores the integration of advanced algorithms that enable drones to adapt their navigation strategies based on real-time environmental feedback, mimicking the decision-making processes of human pilots. The model, fine-tuned on a collected racing drone dataset, demonstrates strong generalization despite the complexity of drone racing environments. RaceVLA outperforms OpenVLA in motion (75.0 vs 60.0) and semantic generalization (45.5 vs 36.3), benefiting from the dynamic camera and simplified motion tasks. However, visual (79.6 vs 87.0) and physical (50.0 vs 76.7) generalization were slightly reduced due to the challenges of maneuvering in dynamic environments with varying object sizes. RaceVLA also outperforms RT-2 across all axes - visual (79.6 vs 52.0), motion (75.0 vs 55.0), physical (50.0 vs 26.7), and semantic (45.5 vs 38.8), demonstrating its robustness for real-time adjustments in complex environments. Experiments revealed an average velocity of 1.04 m/s, with a maximum speed of 2.02 m/s, and consistent maneuverability, demonstrating RaceVLA's ability to handle high-speed scenarios effectively. These findings highlight the potential of RaceVLA for high-performance navigation in competitive racing contexts. The RaceVLA codebase, pretrained weights, and dataset are available at this http URL: https://racevla.github.io/
ROMay 12, 2025
UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language ReasoningOleg Sautenkov, Yasheerah Yaqoot, Muhammad Ahsan Mustafa et al.
We present UAV-CodeAgents, a scalable multi-agent framework for autonomous UAV mission generation, built on large language and vision-language models (LLMs/VLMs). The system leverages the ReAct (Reason + Act) paradigm to interpret satellite imagery, ground high-level natural language instructions, and collaboratively generate UAV trajectories with minimal human supervision. A core component is a vision-grounded, pixel-pointing mechanism that enables precise localization of semantic targets on aerial maps. To support real-time adaptability, we introduce a reactive thinking loop, allowing agents to iteratively reflect on observations, revise mission goals, and coordinate dynamically in evolving environments. UAV-CodeAgents is evaluated on large-scale mission scenarios involving industrial and environmental fire detection. Our results show that a lower decoding temperature (0.5) yields higher planning reliability and reduced execution time, with an average mission creation time of 96.96 seconds and a success rate of 93%. We further fine-tune Qwen2.5VL-7B on 9,000 annotated satellite images, achieving strong spatial grounding across diverse visual categories. To foster reproducibility and future research, we will release the full codebase and a novel benchmark dataset for vision-language-based UAV planning.
ROOct 25, 2021
CoboGuider: Haptic Potential Fields for Safe Human-Robot InteractionViktor Rakhmatulin, Miguel Altamirano Cabrera, Fikre Hagos et al.
Modern industry still relies on manual manufacturing operations and safe human-robot interaction is of great interest nowadays. Speed and Separation Monitoring (SSM) allows close and efficient collaborative scenarios by maintaining a protective separation distance during robot operation. The paper focuses on a novel approach to strengthen the SSM safety requirements by introducing haptic feedback to a robotic cell worker. Tactile stimuli provide early warning of dangerous movements and proximity to the robot, based on the human reaction time and instantaneous velocities of robot and operator. A preliminary experiment was performed to identify the reaction time of participants when they are exposed to tactile stimuli in a collaborative environment with controlled conditions. In a second experiment, we evaluated our approach into a study case where human worker and cobot performed collaborative planetary gear assembly. Results show that the applied approach increased the average minimum distance between the robot's end-effector and hand by 44% compared to the operator relying only on the visual feedback. Moreover, the participants without the haptic support have failed several times to maintain the protective separation distance.