Giovanni Cioffi

RO
h-index15
11papers
410citations
Novelty51%
AI Score45

11 Papers

CVJul 22, 2024
Reinforcement Learning Meets Visual Odometry

Nico Messikommer, Giovanni Cioffi, Mathias Gehrig et al.

Visual Odometry (VO) is essential to downstream mobile robotics and augmented/virtual reality tasks. Despite recent advances, existing VO methods still rely on heuristic design choices that require several weeks of hyperparameter tuning by human experts, hindering generalizability and robustness. We address these challenges by reframing VO as a sequential decision-making task and applying Reinforcement Learning (RL) to adapt the VO process dynamically. Our approach introduces a neural network, operating as an agent within the VO pipeline, to make decisions such as keyframe and grid-size selection based on real-time conditions. Our method minimizes reliance on heuristic choices using a reward function based on pose error, runtime, and other metrics to guide the system. Our RL framework treats the VO system and the image sequence as an environment, with the agent receiving observations from keypoints, map statistics, and prior poses. Experimental results using classical VO methods and public benchmarks demonstrate improvements in accuracy and robustness, validating the generalizability of our RL-enhanced VO approach to different scenarios. We believe this paradigm shift advances VO technology by eliminating the need for time-intensive parameter tuning of heuristics.

CVNov 23, 2022
Data-Driven Feature Tracking for Event Cameras With and Without Frames

Nico Messikommer, Carter Fang, Mathias Gehrig et al.

Because of their high temporal resolution, increased resilience to motion blur, and very sparse output, event cameras have been shown to be ideal for low-latency and low-bandwidth feature tracking, even in challenging scenarios. Existing feature tracking methods for event cameras are either handcrafted or derived from first principles but require extensive parameter tuning, are sensitive to noise, and do not generalize to different scenarios due to unmodeled effects. To tackle these deficiencies, we introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in an intensity frame. We achieve robust performance via a novel frame attention module, which shares information across feature tracks. Our tracker is designed to operate in two distinct configurations: solely with events or in a hybrid mode incorporating both events and frames. The hybrid model offers two setups: an aligned configuration where the event and frame cameras share the same viewpoint, and a hybrid stereo configuration where the event camera and the standard camera are positioned side-by-side. This side-by-side arrangement is particularly valuable as it provides depth information for each feature track, enhancing its utility in applications such as visual odometry and simultaneous localization and mapping.

ROSep 6, 2024
Structure-Invariant Range-Visual-Inertial Odometry

Ivan Alberico, Jeff Delaune, Giovanni Cioffi et al.

The Mars Science Helicopter (MSH) mission aims to deploy the next generation of unmanned helicopters on Mars, targeting landing sites in highly irregular terrain such as Valles Marineris, the largest canyons in the Solar system with elevation variances of up to 8000 meters. Unlike its predecessor, the Mars 2020 mission, which relied on a state estimation system assuming planar terrain, MSH requires a novel approach due to the complex topography of the landing site. This work introduces a novel range-visual-inertial odometry system tailored for the unique challenges of the MSH mission. Our system extends the state-of-the-art xVIO framework by fusing consistent range information with visual and inertial measurements, preventing metric scale drift in the absence of visual-inertial excitation (mono camera and constant velocity descent), and enabling landing on any terrain structure, without requiring any planar terrain assumption. Through extensive testing in image-based simulations using actual terrain structure and textures collected in Mars orbit, we demonstrate that our range-VIO approach estimates terrain-relative velocity meeting the stringent mission requirements, and outperforming existing methods.

CVFeb 26
Motion-aware Event Suppression for Event Cameras

Roberto Pellerito, Nico Messikommer, Giovanni Cioffi et al.

In this work, we introduce the first framework for Motion-aware Event Suppression, which learns to filter events triggered by IMOs and ego-motion in real time. Our model jointly segments IMOs in the current event stream while predicting their future motion, enabling anticipatory suppression of dynamic events before they occur. Our lightweight architecture achieves 173 Hz inference on consumer-grade GPUs with less than 1 GB of memory usage, outperforming previous state-of-the-art methods on the challenging EVIMO benchmark by 67\% in segmentation accuracy while operating at a 53\% higher inference rate. Moreover, we demonstrate significant benefits for downstream applications: our method accelerates Vision Transformer inference by 83\% via token pruning and improves event-based visual odometry accuracy, reducing Absolute Trajectory Error (ATE) by 13\%.

ROFeb 17, 2022Code
Continuous-Time vs. Discrete-Time Vision-based SLAM: A Comparative Study

Giovanni Cioffi, Titus Cieslewski, Davide Scaramuzza

Robotic practitioners generally approach the vision-based SLAM problem through discrete-time formulations. This has the advantage of a consolidated theory and very good understanding of success and failure cases. However, discrete-time SLAM needs tailored algorithms and simplifying assumptions when high-rate and/or asynchronous measurements, coming from different sensors, are present in the estimation process. Conversely, continuous-time SLAM, often overlooked by practitioners, does not suffer from these limitations. Indeed, it allows integrating new sensor data asynchronously without adding a new optimization variable for each new measurement. In this way, the integration of asynchronous or continuous high-rate streams of sensor data does not require tailored and highly-engineered algorithms, enabling the fusion of multiple sensor modalities in an intuitive fashion. On the down side, continuous time introduces a prior that could worsen the trajectory estimates in some unfavorable situations. In this work, we aim at systematically comparing the advantages and limitations of the two formulations in vision-based SLAM. To do so, we perform an extensive experimental analysis, varying robot type, speed of motion, and sensor modalities. Our experimental analysis suggests that, independently of the trajectory type, continuous-time SLAM is superior to its discrete counterpart whenever the sensors are not time-synchronized. In the context of this work, we developed, and open source, a modular and efficient software architecture containing state-of-the-art algorithms to solve the SLAM problem in discrete and continuous time.

ROFeb 26, 2021Code
Autonomous Quadrotor Flight despite Rotor Failure with Onboard Vision Sensors: Frames vs. Events

Sihao Sun, Giovanni Cioffi, Coen de Visser et al.

Fault-tolerant control is crucial for safety-critical systems, such as quadrotors. State-of-art flight controllers can stabilize and control a quadrotor even when subjected to the complete loss of a rotor. However, these methods rely on external sensors, such as GPS or motion capture systems, for state estimation. To the best of our knowledge, this has not yet been achieved with only onboard sensors. In this paper, we propose the first algorithm that combines fault-tolerant control and onboard vision-based state estimation to achieve position control of a quadrotor subjected to complete failure of one rotor. Experimental validations show that our approach is able to accurately control the position of a quadrotor during a motor failure scenario, without the aid of any external sensors. The primary challenge to vision-based state estimation stems from the inevitable high-speed yaw rotation (over 20 rad/s) of the damaged quadrotor, causing motion blur to cameras, which is detrimental to visual inertial odometry (VIO). We compare two types of visual inputs to the vision-based state estimation algorithm: standard frames and events. Experimental results show the advantage of using an event camera especially in low light environments due to its inherent high dynamic range and high temporal resolution. We believe that our approach will render autonomous quadrotors safer in both GPS denied or degraded environments. We release both our controller and VIO algorithm open source.

CVJun 9, 2025
Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction

Ivan Alberico, Marco Cannici, Giovanni Cioffi et al.

In this paper, we present a real-time egocentric trajectory prediction system for table tennis using event cameras. Unlike standard cameras, which suffer from high latency and motion blur at fast ball speeds, event cameras provide higher temporal resolution, allowing more frequent state updates, greater robustness to outliers, and accurate trajectory predictions using just a short time window after the opponent's impact. We collect a dataset of ping-pong game sequences, including 3D ground-truth trajectories of the ball, synchronized with sensor data from the Meta Project Aria glasses and event streams. Our system leverages foveated vision, using eye-gaze data from the glasses to process only events in the viewer's fovea. This biologically inspired approach improves ball detection performance and significantly reduces computational latency, as it efficiently allocates resources to the most perceptually relevant regions, achieving a reduction factor of 10.81 on the collected trajectories. Our detection pipeline has a worst-case total latency of 4.5 ms, including computation and perception - significantly lower than a frame-based 30 FPS system, which, in the worst case, takes 66 ms solely for perception. Finally, we fit a trajectory prediction model to the estimated states of the ball, enabling 3D trajectory forecasting in the future. To the best of our knowledge, this is the first approach to predict table tennis trajectories from an egocentric perspective using event cameras.

ROApr 1, 2025
HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO

Giovanni Cioffi, Leonard Bauersfeld, Davide Scaramuzza

Visual-inertial odometry (VIO) is widely used for state estimation in autonomous micro aerial vehicles using onboard sensors. Current methods improve VIO by incorporating a model of the translational vehicle dynamics, yet their performance degrades when faced with low-accuracy vehicle models or continuous external disturbances, like wind. Additionally, incorporating rotational dynamics in these models is computationally intractable when they are deployed in online applications, e.g., in a closed-loop control system. We present HDVIO2.0, which models full 6-DoF, translational and rotational, vehicle dynamics and tightly incorporates them into a VIO with minimal impact on the runtime. HDVIO2.0 builds upon the previous work, HDVIO, and addresses these challenges through a hybrid dynamics model combining a point-mass vehicle model with a learning-based component, with access to control commands and IMU history, to capture complex aerodynamic effects. The key idea behind modeling the rotational dynamics is to represent them with continuous-time functions. HDVIO2.0 leverages the divergence between the actual motion and the predicted motion from the hybrid dynamics model to estimate external forces as well as the robot state. Our system surpasses the performance of state-of-the-art methods in experiments using public and new drone dynamics datasets, as well as real-world flights in winds up to 25 km/h. Unlike existing approaches, we also show that accurate vehicle dynamics predictions are achievable without precise knowledge of the full vehicle state.

ROSep 23, 2021
The Hilti SLAM Challenge Dataset

Michael Helmberger, Kristian Morin, Beda Berner et al.

Research in Simultaneous Localization and Mapping (SLAM) has made outstanding progress over the past years. SLAM systems are nowadays transitioning from academic to real world applications. However, this transition has posed new demanding challenges in terms of accuracy and robustness. To develop new SLAM systems that can address these challenges, new datasets containing cutting-edge hardware and realistic scenarios are required. We propose the Hilti SLAM Challenge Dataset. Our dataset contains indoor sequences of offices, labs, and construction environments and outdoor sequences of construction sites and parking areas. All these sequences are characterized by featureless areas and varying illumination conditions that are typical in real-world scenarios and pose great challenges to SLAM algorithms that have been developed in confined lab environments. Accurate sparse ground truth, at millimeter level, is provided for each sequence. The sensor platform used to record the data includes a number of visual, lidar, and inertial sensors, which are spatially and temporally calibrated. The purpose of this dataset is to foster the research in sensor fusion to develop SLAM algorithms that can be deployed in tasks where high accuracy and robustness are required, e.g., in construction environments. Many academic and industrial groups tested their SLAM systems on the proposed dataset in the Hilti SLAM Challenge. The results of the challenge, which are summarized in this paper, show that the proposed dataset is an important asset in the development of new SLAM algorithms that are ready to be deployed in the real-world.

ROAug 1, 2021
Powerline Tracking with Event Cameras

Alexander Dietsche, Giovanni Cioffi, Javier Hidalgo-Carrio et al.

Autonomous inspection of powerlines with quadrotors is challenging. Flights require persistent perception to keep a close look at the lines. We propose a method that uses event cameras to robustly track powerlines. Event cameras are inherently robust to motion blur, have low latency, and high dynamic range. Such properties are advantageous for autonomous inspection of powerlines with drones, where fast motions and challenging illumination conditions are ordinary. Our method identifies lines in the stream of events by detecting planes in the spatio-temporal signal, and tracks them through time. The implementation runs onboard and is capable of detecting multiple distinct lines in real time with rates of up to $320$ thousand events per second. The performance is evaluated in real-world flights along a powerline. The tracker is able to persistently track the powerlines, with a mean lifetime of the line $10\times$ longer than existing approaches.

ROMar 9, 2020
Tightly-coupled Fusion of Global Positional Measurements in Optimization-based Visual-Inertial Odometry

Giovanni Cioffi, Davide Scaramuzza

Motivated by the goal of achieving robust, drift-free pose estimation in long-term autonomous navigation, in this work we propose a methodology to fuse global positional information with visual and inertial measurements in a tightly-coupled nonlinear-optimization-based estimator. Differently from previous works, which are loosely-coupled, the use of a tightly-coupled approach allows exploiting the correlations amongst all the measurements. A sliding window of the most recent system states is estimated by minimizing a cost function that includes visual re-projection errors, relative inertial errors, and global positional residuals. We use IMU preintegration to formulate the inertial residuals and leverage the outcome of such algorithm to efficiently compute the global position residuals. The experimental results show that the proposed method achieves accurate and globally consistent estimates, with negligible increase of the optimization computational cost. Our method consistently outperforms the loosely-coupled fusion approach. The mean position error is reduced up to 50% with respect to the loosely-coupled approach in outdoor Unmanned Aerial Vehicle (UAV) flights, where the global position information is given by noisy GPS measurements. To the best of our knowledge, this is the first work where global positional measurements are tightly fused in an optimization-based visual-inertial odometry algorithm, leveraging the IMU preintegration method to define the global positional factors.