Alexey Vinel

h-index2

6papers

34citations

Novelty30%

AI Score41

Ranked #68,465 of 194,257 authors (top 35%)#2,022 in RO (top 30%)

6 Papers

5.4CVMay 12Code

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

Mohammad Khoshkdahan, Alexey Vinel

Safe autonomous agents and mobile robots need fast real time 3D perception, especially for vulnerable road users (VRUs) such as pedestrians. We introduce a new bird's eye view (BEV) encoding, which maps the full 3D LiDAR point cloud into a light-weight 2D BEV tensor with three height bands. We explicitly reformulate 3D detection as a 2D detection problem and then reconstruct 3D boxes from the BEV outputs. A single network detects cars, pedestrians, and cyclists in one pass. The backbone uses area attention at deep stages, a hierarchical bidirectional neck over P1 to P4 fuses context and detail, and the head predicts oriented boxes with distribution focal learning for side offsets and a rotated IoU loss. Training applies a small vertical re bin and a mild reflectance jitter in channel space to resist memorization. We use an interquartile range (IQR) filter to remove noisy and outlier LiDAR points during 3D reconstruction. On KITTI dataset, TriBand-BEV attains 58.7/52.6/47.2 pedestrian BEV AP(%) for easy, moderate, and hard at 49 FPS on a single consumer GPU, surpassing Complex-YOLO, with gains of +12.6%, +7.5%, and +3.1%. Qualitative scenes show stable detection under occlusion. The pipeline is compact and ready for real time robotic deployment. Our source code is publicly available on GitHub.

8.0NIJun 29

Scalable Intention Sharing for ETSI VAMs

Felipe E. Valle, Oscar Amador, Johan Thunberg et al.

Efficient maneuver coordination in dense V2X environments requires accurate short-term prediction while maintaining low communication and computational overhead. Current European Telecommunications Standards Institute (ETSI)-compliant approaches rely on intention detection and trajectory vector transmission, which scale poorly with neighborhood size and prediction horizon. This paper revisits maneuver coordination from an intention sharing perspective and investigates geometric encodings that enable scalable communication. First, we analyze three ETSI-compliant encodings, trajectory vectors, N-polygons, and uncertainty ellipses, through complexity analysis and simulation-based CPU measurements. Results show that uncertainty ellipses reduce computational complexity by an order of magnitude compared with trajectory vectors while maintaining a constant message size. Building on this, an Extended Kalman Filter is used to generate short-horizon predictions, which are encoded as uncertainty ellipses to represent the intended maneuver. The prediction pipeline is evaluated using real-world GNSS trajectories collected from cyclist maneuvers on a controlled test track, demonstrating that the approach achieves reliable multisecond prediction horizons while maintaining scalability for dense V2X environments.

6.6ROMay 12

Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation

Mohammad Khoshkdahan, John Pravin Arockiasamy, Andy Flores Comeca et al.

Collisions at non-line-of-sight (NLOS) intersections remain a major safety concern because drivers have limited visibility of approaching traffic. V2X based warnings can reduce these risks, yet many vehicles are not equipped with V2X and drivers may ignore in vehicle alerts. Collective perception (CP) can compensate for low V2X penetration by extending the awareness of connected vehicles, but it cannot influence unconnected vehicles. To fill this gap, our work introduces a complementary concept that adds a cooperative humanoid robot as an active traffic moderator capable of physically stopping a vehicle that attempts to merge into an unseen traffic stream. The system operates on two parallel perception pathways. A dual camera infrastructure unit detects the position, speed and motion of approaching vehicles and transmits this information to the robot as a collective perception message (CPM). The robot also receives cooperative awareness messages (CAM) from connected vehicles through its onboard V2X unit and can act as a relay for decentralized environmental notification messages (DENM) when safety events originate elsewhere along the road. A fusion module combines these streams to maintain a robust real time view of the main road. A Zone of Danger (ZoD) is defined and used to predict whether an approaching vehicle creates a collision risk for a merging road user. When such a risk is detected, the robot issues a human-like STOP gesture and blocks the merging path until the hazard disappears. The full system was deployed at the Future Mobility Park (FMP) in Rotterdam. Experiments show that the combined vision and V2X perception allows the robot to detect approaching vehicles early, predict hazards reliably and prevent unsafe merges in real world NLOS conditions.

6.1ROMay 7

Multi-Robot Coordination in V2X Environments

John Pravin Arockiasamy, Alexey Vinel

This paper presents a Vehicle-to-Everything (V2X) communication framework that enables decentralized cooperation among social robots operating in complex urban traffic environments. Building on ETSI Cooperative Awareness and Maneuver Coordination services, the framework introduces two robot-centric facility-layer services: the Robot Awareness Service (RAS) and the Robot Maneuver Coordination Service (RMCS), realized through the Robot Awareness Message (RAM) and the Robot Maneuver Coordination Message (RMCM), respectively. RAS enables role-aware, task-oriented robot awareness while integrating externally detected Vulnerable Road Users (VRUs), including non-V2X pedestrians, into cooperative awareness. RMCS supports event-driven, low-latency coordination of robot maneuvers under explicitly established roles, without centralized infrastructure or prior pairing. A real-world proof of concept demonstrates deterministic multi-robot coordination between a humanoid robot and a quadrupedal robot assisting a pedestrian during a road-crossing scenario, governed by a formally specified finite-state coordination model. Complementary simulations evaluate robot-mediated VRU clustering in mixed V2X environments, showing that RAS-based clustering integrates non-V2X VRUs in safety-critical areas while reducing redundant transmissions from V2X-enabled VRUs, thereby lowering channel load. Together, the proposed services provide a scalable and standards-aligned foundation for integrating cooperative robots into future Connected, Cooperative, and Automated Mobility ecosystems.

8.4CVApr 6, 2025

Systematic Literature Review on Vehicular Collaborative Perception -- A Computer Vision Perspective

Lei Wan, Jianxin Zhao, Andreas Wiedholz et al.

The effectiveness of autonomous vehicles relies on reliable perception capabilities. Despite significant advancements in artificial intelligence and sensor fusion technologies, current single-vehicle perception systems continue to encounter limitations, notably visual occlusions and limited long-range detection capabilities. Collaborative Perception (CP), enabled by Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communication, has emerged as a promising solution to mitigate these issues and enhance the reliability of autonomous systems. Beyond advancements in communication, the computer vision community is increasingly focusing on improving vehicular perception through collaborative approaches. However, a systematic literature review that thoroughly examines existing work and reduces subjective bias is still lacking. Such a systematic approach helps identify research gaps, recognize common trends across studies, and inform future research directions. In response, this study follows the PRISMA 2020 guidelines and includes 106 peer-reviewed articles. These publications are analyzed based on modalities, collaboration schemes, and key perception tasks. Through a comparative analysis, this review illustrates how different methods address practical issues such as pose errors, temporal latency, communication constraints, domain shifts, heterogeneity, and adversarial attacks. Furthermore, it critically examines evaluation methodologies, highlighting a misalignment between current metrics and CP's fundamental objectives. By delving into all relevant topics in-depth, this review offers valuable insights into challenges, opportunities, and risks, serving as a reference for advancing research in vehicular collaborative perception.

3.0ROAug 2, 2021

"Robot Steganography"?: Opportunities and Challenges

Martin Cooney, Eric Järpe, Alexey Vinel

Robots are being designed to communicate with people in various public and domestic venues in a helpful, discreet way. Here, we use a speculative approach to shine light on a new concept of robot steganography (RS), that a robot could seek to help vulnerable populations by discreetly warning of potential threats. We first identify some potentially useful scenarios for RS related to safety and security -- concerns that are estimated to cost the world trillions of dollars each year -- with a focus on two kinds of robots, an autonomous vehicle (AV) and a socially assistive humanoid robot (SAR). Next, we propose that existing, powerful, computer-based steganography (CS) approaches can be adopted with little effort in new contexts (SARs), while also pointing out potential benefits of human-like steganography (HS): although less efficient and robust than CS, HS represents a currently-unused form of RS that could also be used to avoid requiring computers or detection by more technically advanced adversaries. This analysis also introduces some unique challenges of RS that arise from message generation, indirect perception, and effects of perspective. For this, we explore some related theoretical and practical concerns for selecting carrier signals and generating messages, also making available some code and a video demo. Finally, we report on checking the current feasibility of the RS concept via a simplified user study, confirming that messages can be hidden in a robot's behaviors. The immediate implication is that RS could help to improve people's lives and mitigate some costly problems -- suggesting the usefulness of further discussion, ideation, and consideration by designers.