CVJan 20Code
Correcting and Quantifying Systematic Errors in 3D Box Annotations for Autonomous DrivingAlexandre Justo Miro, Ludvig af Klinteberg, Bogdan Timus et al.
Accurate ground truth annotations are critical to supervised learning and evaluating the performance of autonomous vehicle systems. These vehicles are typically equipped with active sensors, such as LiDAR, which scan the environment in predefined patterns. 3D box annotation based on data from such sensors is challenging in dynamic scenarios, where objects are observed at different timestamps, hence different positions. Without proper handling of this phenomenon, systematic errors are prone to being introduced in the box annotations. Our work is the first to discover such annotation errors in widely used, publicly available datasets. Through our novel offline estimation method, we correct the annotations so that they follow physically feasible trajectories and achieve spatial and temporal consistency with the sensor data. For the first time, we define metrics for this problem; and we evaluate our method on the Argoverse 2, MAN TruckScenes, and our proprietary datasets. Our approach increases the quality of box annotations by more than 17% in these datasets. Furthermore, we quantify the annotation errors in them and find that the original annotations are misplaced by up to 2.5 m, with highly dynamic objects being the most affected. Finally, we test the impact of the errors in benchmarking and find that the impact is larger than the improvements that state-of-the-art methods typically achieve with respect to the previous state-of-the-art methods; showing that accurate annotations are essential for correct interpretation of performance. Our code is available at https://github.com/alexandre-justo-miro/annotation-correction-3D-boxes.
CVOct 7, 2023
Towards Long-Range 3D Object Detection for Autonomous VehiclesAjinkya Khoche, Laura Pereira Sánchez, Nazre Batool et al.
3D object detection at long range is crucial for ensuring the safety and efficiency of self driving vehicles, allowing them to accurately perceive and react to objects, obstacles, and potential hazards from a distance. But most current state of the art LiDAR based methods are range limited due to sparsity at long range, which generates a form of domain gap between points closer to and farther away from the ego vehicle. Another related problem is the label imbalance for faraway objects, which inhibits the performance of Deep Neural Networks at long range. To address the above limitations, we investigate two ways to improve long range performance of current LiDAR based 3D detectors. First, we combine two 3D detection networks, referred to as range experts, one specializing at near to mid range objects, and one at long range 3D detection. To train a detector at long range under a scarce label regime, we further weigh the loss according to the labelled point's distance from ego vehicle. Second, we augment LiDAR scans with virtual points generated using Multimodal Virtual Points (MVP), a readily available image-based depth completion algorithm. Our experiments on the long range Argoverse2 (AV2) dataset indicate that MVP is more effective in improving long range performance, while maintaining a straightforward implementation. On the other hand, the range experts offer a computationally efficient and simpler alternative, avoiding dependency on image-based segmentation networks and perfect camera-LiDAR calibration.
CVJan 29, 2025Code
SSF: Sparse Long-Range Scene Flow for Autonomous DrivingAjinkya Khoche, Qingwen Zhang, Laura Pereira Sanchez et al.
Scene flow enables an understanding of the motion characteristics of the environment in the 3D world. It gains particular significance in the long-range, where object-based perception methods might fail due to sparse observations far away. Although significant advancements have been made in scene flow pipelines to handle large-scale point clouds, a gap remains in scalability with respect to long-range. We attribute this limitation to the common design choice of using dense feature grids, which scale quadratically with range. In this paper, we propose Sparse Scene Flow (SSF), a general pipeline for long-range scene flow, adopting a sparse convolution based backbone for feature extraction. This approach introduces a new challenge: a mismatch in size and ordering of sparse feature maps between time-sequential point scans. To address this, we propose a sparse feature fusion scheme, that augments the feature maps with virtual voxels at missing locations. Additionally, we propose a range-wise metric that implicitly gives greater importance to faraway points. Our method, SSF, achieves state-of-the-art results on the Argoverse2 dataset, demonstrating strong performance in long-range scene flow estimation. Our code will be released at https://github.com/KTH-RPL/SSF.git.
CVMar 2, 2025
HiMo: High-Speed Objects Motion Compensation in Point CloudsQingwen Zhang, Ajinkya Khoche, Yi Yang et al.
LiDAR point cloud is essential for autonomous vehicles, but motion distortions from dynamic objects degrade the data quality. While previous work has considered distortions caused by ego motion, distortions caused by other moving objects remain largely overlooked, leading to errors in object shape and position. This distortion is particularly pronounced in high-speed environments such as highways and in multi-LiDAR configurations, a common setup for heavy vehicles. To address this challenge, we introduce HiMo, a pipeline that repurposes scene flow estimation for non-ego motion compensation, correcting the representation of dynamic objects in point clouds. During the development of HiMo, we observed that existing self-supervised scene flow estimators often produce degenerate or inconsistent estimates under high-speed distortion. We further propose SeFlow++, a real-time scene flow estimator that achieves state-of-the-art performance on both scene flow and motion compensation. Since well-established motion distortion metrics are absent in the literature, we introduce two evaluation metrics: compensation accuracy at a point level and shape similarity of objects. We validate HiMo through extensive experiments on Argoverse 2, ZOD, and a newly collected real-world dataset featuring highway driving and multi-LiDAR-equipped heavy vehicles. Our findings show that HiMo improves the geometric consistency and visual fidelity of dynamic objects in LiDAR point clouds, benefiting downstream tasks such as semantic segmentation and 3D detection. See https://kin-zhang.github.io/HiMo for more details.
CVAug 25, 2025
DoGFlow: Self-Supervised LiDAR Scene Flow via Cross-Modal Doppler GuidanceAjinkya Khoche, Qingwen Zhang, Yixi Cai et al.
Accurate 3D scene flow estimation is critical for autonomous systems to navigate dynamic environments safely, but creating the necessary large-scale, manually annotated datasets remains a significant bottleneck for developing robust perception models. Current self-supervised methods struggle to match the performance of fully supervised approaches, especially in challenging long-range and adverse weather scenarios, while supervised methods are not scalable due to their reliance on expensive human labeling. We introduce DoGFlow, a novel self-supervised framework that recovers full 3D object motions for LiDAR scene flow estimation without requiring any manual ground truth annotations. This paper presents our cross-modal label transfer approach, where DoGFlow computes motion pseudo-labels in real-time directly from 4D radar Doppler measurements and transfers them to the LiDAR domain using dynamic-aware association and ambiguity-resolved propagation. On the challenging MAN TruckScenes dataset, DoGFlow substantially outperforms existing self-supervised methods and improves label efficiency by enabling LiDAR backbones to achieve over 90% of fully supervised performance with only 10% of the ground truth data. For more details, please visit https://ajinkyakhoche.github.io/DogFlow/
LGAug 19, 2025
AutoScale: Linear Scalarization Guided by Multi-Task Optimization MetricsYi Yang, Kei Ikemura, Qingwen Zhang et al.
Recent multi-task learning studies suggest that linear scalarization, when using well-chosen fixed task weights, can achieve comparable to or even better performance than complex multi-task optimization (MTO) methods. It remains unclear why certain weights yield optimal performance and how to determine these weights without relying on exhaustive hyperparameter search. This paper establishes a direct connection between linear scalarization and MTO methods, revealing through extensive experiments that well-performing scalarization weights exhibit specific trends in key MTO metrics, such as high gradient magnitude similarity. Building on this insight, we introduce AutoScale, a simple yet effective two-phase framework that uses these MTO metrics to guide weight selection for linear scalarization, without expensive weight search. AutoScale consistently shows superior performance with high efficiency across diverse datasets including a new large-scale benchmark.
CVMar 27, 2024
Addressing Data Annotation Challenges in Multiple Sensors: A Solution for Scania Collected DatasetsAjinkya Khoche, Aron Asefaw, Alejandro Gonzalez et al.
Data annotation in autonomous vehicles is a critical step in the development of Deep Neural Network (DNN) based models or the performance evaluation of the perception system. This often takes the form of adding 3D bounding boxes on time-sequential and registered series of point-sets captured from active sensors like Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR). When annotating multiple active sensors, there is a need to motion compensate and translate the points to a consistent coordinate frame and timestamp respectively. However, highly dynamic objects pose a unique challenge, as they can appear at different timestamps in each sensor's data. Without knowing the speed of the objects, their position appears to be different in different sensor outputs. Thus, even after motion compensation, highly dynamic objects are not matched from multiple sensors in the same frame, and human annotators struggle to add unique bounding boxes that capture all objects. This article focuses on addressing this challenge, primarily within the context of Scania collected datasets. The proposed solution takes a track of an annotated object as input and uses the Moving Horizon Estimation (MHE) to robustly estimate its speed. The estimated speed profile is utilized to correct the position of the annotated box and add boxes to object clusters missed by the original annotation.
ROSep 2, 2021
Collision avoidance for multiple MAVs using fast centralized NMPCBjörn Lindqvist, Sina Sharif Mansouri, Pantelis Sopasakis et al.
This article proposes a novel control architecture using a centralized nonlinear model predictive control (CNMPC) scheme for controlling multiple micro aerial vehicles (MAVs). The control architecture uses an augmented state system to control multiple agents and performs both obstacle and collision avoidance. The optimization algorithm used is OpEn, based on the proximal averaged Newton type method for optimal control (PANOC) which provides fast convergence for non-convex optimization problems. The objective is to perform position reference tracking for each individual agent, while nonlinear constrains guarantee collision avoidance and smooth control signals. To produce a trajectory that satisfies all constraints a penalty method is applied to the nonlinear constraints. The efficacy of this proposed novel control scheme is successfully demonstrated through simulation results and comparisons, in terms of computation time and constraint violations, while are provided with respect to the number of agents.
ROAug 30, 2021
COMPRA: A COMPact Reactive Autonomy framework for subterranean MAV based search-and-rescue operationsBjörn Lindqvist, Christoforos Kanellakis, Sina Sharif Mansouri et al.
This work establishes COMPRA, a compact and reactive autonomy framework for fast deployment of Micro Aerial Vehicles (MAVs) in subterranean Search-and-Rescue (SAR) missions. A COMPRA-enabled MAV is able to autonomously explore previously unknown areas while specific mission criteria are considered e.g. an object of interest is identified and localized, the remaining useful battery life, the overall desired exploration mission duration. The proposed architecture follows a low-complexity algorithmic design to facilitate fully on-board computations, including nonlinear control, state-estimation, navigation, exploration behavior and object localization capabilities. The framework is mainly structured around a reactive local avoidance planner, based on enhanced Potential Field concepts and using instantaneous 3D pointclouds, as well as a computationally efficient heading regulation technique, based on depth images from an instantaneous camera stream. Those techniques decouple the collision-free path generation from the dependency of a global map and are capable of handling imprecise localization occasions. Field experimental verification of the overall architecture is performed in relevant unknown Global Positioning System (GPS)-denied environments.
ROJan 8, 2021
Geometry Aware NMPC Scheme for Morphing Quadrotor Navigation in Restricted EntrancesAndreas Papadimitriou, Sina Sharif Mansouri, Christoforos Kanellakis et al.
Geometry-morphing Micro Aerial Vehicles (MAVs) are gaining more and more attention lately, since their ability to modify their geometric morphology in-flight increases their versatility, while expanding their application range. In this novel research field, most of the works focus on the platform design and on the low-level control part for maintaining stability after the deformation. Nevertheless, another aspect of geometry morphing MAVs is the association of the deformation with respect to the shape and structure of the environment. In this article, we propose a novel Nonlinear Model Predictive Control (NMPC) structure that modifies the morphology of a quadrotor based on the environmental entrances geometrical shape. The proposed method considers restricted entrances as a constraint in the NMPC and modifies the arm configuration of the MAV to provide a collision free path from the initial position to the desired goal, while passing through the entrance. To the authors' best knowledge, this work is the first to connect the in-flight morphology with the characteristics of environmental shapes. Multiple simulation results depict the performance and efficiency of the proposed scheme in scenarios where the quadrotor is commanded to pass through restricted areas.
ROAug 3, 2020
Nonlinear MPC for Collision Avoidance and Controlof UAVs With Dynamic ObstaclesBjörn Lindqvist, Sina Sharif Mansouri, Ali-akbar Agha-mohammadi et al.
This article proposes a Novel Nonlinear Model Predictive Control (NMPC) for navigation and obstacle avoidance of an Unmanned Aerial Vehicle (UAV). The proposed NMPC formulation allows for a fully parametric obstacle trajectory, while in this article we apply a classification scheme to differentiate between different kinds of trajectories to predict future obstacle positions. The trajectory calculation is done from an initial condition, and fed to the NMPC as an additional input. The solver used is the nonlinear, non-convex solver Proximal Averaged Newton for Optimal Control (PANOC) and its associated software OpEn (Optimization Engine), in which we apply a penalty method to properly consider the obstacles and other constraints during navigation. The proposed NMPC scheme allows for real-time solutions using a sampling time of 50 ms and a two second prediction of both the obstacle trajectory and the NMPC problem, which implies that the scheme can be considered as a local path-planner. This paper will present the NMPC cost function and constraint formulation, as well as the methodology of dealing with the dynamic obstacles. We include multiple laboratory experiments to demonstrate the efficacy of the proposed control architecture, and to show that the proposed method delivers fast and computationally stable solutions to the dynamic obstacle avoidance scenarios.
ROJul 31, 2020
A Unified NMPC Scheme for MAVs Navigation with 3D Collision Avoidance under Position UncertaintySina Sharif Mansouri, Christoforos Kanellakis, Bjorn Lindqvist et al.
This article proposes a novel Nonlinear Model Predictive Control (NMPC) framework for Micro Aerial Vehicle (MAV) autonomous navigation in constrained environments. The introduced framework allows us to consider the nonlinear dynamics of MAVs and guarantees real-time performance. Our first contribution is to design a computationally efficient subspace clustering method to reveal from geometrical constraints to underlying constraint planes within a 3D point cloud, obtained from a 3D lidar scanner. The second contribution of our work is to incorporate the extracted information into the nonlinear constraints of NMPC for avoiding collisions. Our third contribution focuses on making the controller robust by considering the uncertainty of localization and NMPC using the Shannon entropy. This step enables us to track either the position or velocity references, or none of them if necessary. As a result, the collision avoidance constraints are defined in the local coordinates of MAVs and it remains active and guarantees collision avoidance, despite localization uncertainties, e.g., position estimation drifts. Additionally, as the platform continues the mission, this will result in less uncertain position estimations, due to the feature extraction and loop closure. The efficacy of the suggested framework has been evaluated using various simulations in the Gazebo environment.
ROJun 7, 2020
Subterranean MAV Navigation based on Nonlinear MPC with Collision Avoidance ConstraintsSina Sharif Mansouri, Christoforos Kanellakis, Emil Fresk et al.
Micro Aerial Vehicles (MAVs) navigation in subterranean environments is gaining attention in the field of aerial robotics, however there are still multiple challenges for collision free navigation in such harsh environments. This article proposes a novel baseline solution for collision free navigation with Nonlinear Model Predictive Control (NMPC). In the proposed method, the MAV is considered as a floating object, where the velocities on the $x$, $y$ axes and the position on altitude are the references for the NMPC to navigate along the tunnel, while the NMPC avoids the collision by considering kinematics of the obstacles based on measurements from a 2D lidar. Moreover, a novel approach for correcting the heading of the MAV towards the center of the mine tunnel is proposed, while the efficacy of the suggested framework has been evaluated in multiple field trials in an underground mine in Sweden.
ROJun 7, 2020
Unsupervised Learning for Subterranean Junction Recognition Based on 2D Point CloudSina Sharif Mansouri, Farhad Pourkamali-Anaraki, Miguel Castano Arranz et al.
This article proposes a novel unsupervised learning framework for detecting the number of tunnel junctions in subterranean environments based on acquired 2D point clouds. The implementation of the framework provides valuable information for high level mission planners to navigate an aerial platform in unknown areas or robot homing missions. The framework utilizes spectral clustering, which is capable of uncovering hidden structures from connected data points lying on non-linear manifolds. The spectral clustering algorithm computes a spectral embedding of the original 2D point cloud by utilizing the eigen decomposition of a matrix that is derived from the pairwise similarities of these points. We validate the developed framework using multiple data-sets, collected from multiple realistic simulations, as well as from real flights in underground environments, demonstrating the performance and merits of the proposed methodology.
ROJun 7, 2020
MAV Navigation in Unknown Dark Underground Mines Using Deep LearningSina Sharif Mansouri, Christoforos Kanellakis, Petros Karvelis et al.
This article proposes a Deep Learning (DL) method to enable fully autonomous flights for low-cost Micro Aerial Vehicles (MAVs) in unknown dark underground mine tunnels. This kind of environments pose multiple challenges including lack of illumination, narrow passages, wind gusts and dust. The proposed method does not require accurate pose estimation and considers the flying platform as a floating object. The Convolutional Neural Network (CNN) supervised image classifier method corrects the heading of the MAV towards the center of the mine tunnel by processing the image frames from a single on-board camera, while the platform navigates at constant altitude and desired velocity references. Moreover, the output of the CNN module can be used from the operator as means of collision prediction information. The efficiency of the proposed method has been successfully experimentally evaluated in multiple field trials in an underground mine in Sweden, demonstrating the capability of the proposed method in different areas and illumination levels.
ROMay 29, 2020
MAV Development Towards Navigation in Unknown and Dark Mining TunnelsDariusz Kominiak, Sina Sharif Mansouri, Christoforos Kanellakis et al.
The Mining industry considers the deployment of MAV for autonomous inspection of tunnels and shafts to increase safety and productivity. However, mines are challenging and harsh environments that have a direct effect on the degradation of high-end and expensive utilized components over time. Inspired by this effect, this article presents a low cost and modular platform for designing a fully autonomous navigating MAV without requiring any prior information from the surrounding environment. The design of the proposed aerial vehicle can be considered as a consumable platform that can be instantly replaced in case of damage or defect, thus comes into agreement with the vision of mining companies for utilizing stable aerial robots with reasonable cost. In the proposed design, the operator has access to all on-board data, thus increasing the overall customization of the design and the execution of the mine inspection mission. The MAV platform has a software core based on ROS operating on an Aaeon UP-Board, while it is equipped with a sensor suite to accomplish the autonomous navigation equally reliable when compared to high-end and expensive platforms.
ROSep 10, 2019
Visual Area Coverage with Attitude-Dependent Camera Footprints by Particle HarvestingSina Sharif Mansouri, Pantelis Sopasakis, George Georgoulas et al.
In aerial visual area coverage missions, the camera footprint changes over time based on the camera position and orientation -- a fact that complicates the whole process of coverage and path planning. This article proposes a solution to the problem of visual coverage by filling the target area with a set of randomly distributed particles and harvesting them by camera footprints. This way, high coverage is obtained at a low computational cost. In this approach, the path planner considers six degrees of freedom (DoF) for the camera movement and commands thrust and attitude references to a lower layer controller, while maximizing the covered area and coverage quality. The proposed method requires a priori information of the boundaries of the target area and can handle areas of very complex and highly non-convex geometry. The effectiveness of the approach is demonstrated in multiple simulations in terms of computational efficiency and coverage.
ROJan 16, 2019
Autonomous visual inspection of large-scale infrastructures using aerial robotsChristoforos Kanellakis, Emil Fresk, Sina Sharif Mansouri et al.
This article presents a novel framework for performing visual inspection around 3D infrastructures, by establishing a team of fully autonomous Micro Aerial Vehicles (MAVs) with robust localization, planning and perception capabilities. The proposed aerial inspection system reaches high level of autonomy on a large scale, while pushing to the boundaries the real life deployment of aerial robotics. In the presented approach, the MAVs deployed for the inspection of the structure rely only on their onboard computer and sensory systems. The developed framework envisions a modular system, combining open research challenges in the fields of localization, path planning and mapping, with an overall capability for a fast on site deployment and a reduced execution time that can repeatably perform the inspection mission according to the operator needs. The architecture of the established system includes: 1) a geometry-based path planner for coverage of complex structures by multiple MAVs, 2) an accurate yet flexible localization component, which provides an accurate pose estimation for the MAVs by utilizing an Ultra Wideband fused inertial estimation scheme, and 3) visual data post-processing scheme for the 3D model building. The performance of the proposed framework has been experimentally demonstrated in multiple realistic outdoor field trials, all focusing on the challenging structure of a wind turbine as the main test case. The successful experimental results, depict the merits of the proposed autonomous navigation system as the enabling technology towards aerial robotic inspectors.
RONov 16, 2016
Cooperative Aerial Coverage Path Planning for Visual Inspection of Complex InfrastructuresSina Sharif Mansouri, Christoforos Kanellakis, David Wuthier et al.
This article addresses the problem of Cooperative Coverage Path Planning (C-CPP) for the inspection of complex infrastructures (offline 3D reconstruction) by utilizing multiple Unmanned Autonomous Vehicles (UAVs). The proposed scheme, based on a priori 3D model of the infrastructure under inspection, is able to generate multiple paths for UAVs in order to achieve a complete cooperative coverage in a short time. Initially the infrastructure under inspection is being sliced by horizontal planes, which has the capability of recognizing the branches of the structure and these branches will be handled as breaking points for the path planning of the UAVs to collaboratively execute the coverage task in less time and more realistically, based on the current flying times of the UAVs. The multiple data sets collected from the coverage are merged for the offline sparse and dense 3D reconstruction of the infrastructure by utilizing SLAM and Structure from Motion approaches, with either monocular or stereo sensors. The performance of the proposed C-CPP has been experimentally evaluated in multiple indoor and realistic outdoor infrastructure inspection experiments.