Kostas Alexis

RO
h-index49
43papers
682citations
Novelty41%
AI Score55

43 Papers

RONov 6, 2025
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

Mayank Mittal, Pascal Roth, James Tigue et al. · nvidia

We present Isaac Lab, the natural successor to Isaac Gym, which extends the paradigm of GPU-native robotics simulation into the era of large-scale multi-modal learning. Isaac Lab combines high-fidelity GPU parallel physics, photorealistic rendering, and a modular, composable architecture for designing environments and training robot policies. Beyond physics and rendering, the framework integrates actuator models, multi-frequency sensor simulation, data collection pipelines, and domain randomization tools, unifying best practices for reinforcement and imitation learning at scale within a single extensible platform. We highlight its application to a diverse set of challenges, including whole-body control, cross-embodiment mobility, contact-rich and dexterous manipulation, and the integration of human demonstrations for skill acquisition. Finally, we discuss upcoming integration with the differentiable, GPU-accelerated Newton physics engine, which promises new opportunities for scalable, data-efficient, and gradient-based approaches to robot learning. We believe Isaac Lab's combination of advanced simulation capabilities, rich sensing, and data-center scale execution will help unlock the next generation of breakthroughs in robotics research.

ROMay 30
BEVIO: Efficient Bird's-Eye-View based Sparse-Update Visual-Inertial Odometry for Lunar Day-Night Navigation

Mohit Singh, Shehryar Khattak, Ashish Goel et al.

Visual-Inertial Odometry (VIO) provides smooth, high-rate state estimates and has been widely used for robotic navigation in both terrestrial and planetary applications. However, its performance is typically dependent on the frequency of visual updates, which is a challenge for planetary rovers operating under extreme resource constraints and low frame rates. This work investigates enabling reliable VIO with very sparse visual updates for lunar rover applications, addressing both day and night-time operations where feature associations become especially difficult under self-illumination conditions. We propose a Bird's Eye View (BEV)-based image matching scheme that remains robust to larger inter-frame motions and more reliable feature matching despite significant visual appearance changes. We extensively evaluate our proposed approach, BEVIO, through high-fidelity photorealistic lunar and real-time robotic experiments conducted using a half-scale lunar rover, in a long-term day-night deployment at Plaster City, CA, USA. The results demonstrate that our method enables reliable day and nighttime self-illuminated traverses at visual update rates as low as 0.25 Hz, underscoring its suitability for navigation on power- and compute-limited lunar rovers.

ROJun 3
Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

Luca Zanatta, Grzegorz Malczyk, Kostas Alexis

World models, learned generative models that predict how an environment evolves, have become a promising tool for sample-efficient robot learning. Yet how robust they are to environmental variability remains poorly understood. To address this, we conduct a systematic study using vision-based quadrotor navigation as a testbed problem, training DreamerV3-based world models under varying levels of environmental randomness and evaluating them across all levels through cross-environment validation, spanning both Self-Supervised Learning (SSL) pretraining and Reinforcement Learning (RL) fine-tuning. We then deploy all world models and associated navigation policies on a real quadrotor in unseen environments, including an open-loop run where the model receives just 2.5s of real sensory input before all sensors are cut off, leaving the system to navigate entirely in imagination over a 12m traverse. Our results show that world model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. We further identify (a) the discrete latent size and (b) the training-sequence length as the dominant factors governing world model quality.

ROApr 16
DigiForest: Digital Analytics and Robotics for Sustainable Forestry

Marco Camurri, Enrico Tomelleri, Matías Mattamala et al. · oxford

Covering one third of Earth's land surface, forests are vital to global biodiversity, climate regulation, and human well-being. In Europe, forests and woodlands reach approximately 40% of land area, and the forestry sector is central to achieving the EU's climate neutrality and biodiversity goals; these emphasize sustainable forest management, increased use of long-lived wood products, and resilient forest ecosystems. To meet these goals and properly address their inherent challenges, current practices require further innovation. This chapter introduces DigiForest, a novel, large-scale precision forestry approach leveraging digital technologies and autonomous robotics. DigiForest is structured around four main components: (1) autonomous, heterogeneous mobile robots (aerial, legged, and marsupial) for tree-level data collection; (2) automated extraction of tree traits to build forest inventories; (3) a Decision Support System (DSS) for forecasting forest growth and supporting decision-making; and (4) low-impact selective logging using purpose-built autonomous harvesters. These technologies have been extensively validated in real-world conditions in several locations, including forests in Finland, the UK, and Switzerland.

ROMay 12Code
The Unified Autonomy Stack: Toward a Blueprint for Generalizable Robot Autonomy

Mihir Dharmadhikari, Nikhil Khedekar, Mihir Kulkarni et al.

We introduce and open-source the Unified Autonomy Stack, a system-level solution that enables resilient autonomy across diverse aerial and ground robot morphologies. The architecture centers on three synergistic modules -- multi-modal perception, multi-behavior planning, and multi-layered safe navigation -- that together deliver comprehensive mission autonomy. The stack fuses data from LiDAR, radar, vision, and inertial sensing, enabling (a) robust localization and mapping through factor graph-based fusion, (b) semantic scene understanding, (c) motion and informative path planning through sampling-based techniques adaptive across spatial scales, as well as (d) multi-layered safe navigation both through planning on the online reconstructed map and deep learning-driven exteroceptive policies alongside last-resort safety filters using control barrier functions. The resulting behaviors include safe GNSS-denied navigation into unknown and perceptually-degraded regions, exploration of complex environments, object discovery, and efficient inspection planning. The stack has been field-tested and validated on both aerial (rotorcraft) and ground (legged) robots operating in a host of demanding environments, including self-similar and smoke-filled settings, with complex geometries and high obstacle clutter. These tests demonstrate resilient performance in challenging conditions. To facilitate ease of adoption, we open-source the implementation alongside supporting documentation, validation, and evaluation datasets https://github.com/ntnu-arl/unified_autonomy_stack. A video giving the overview of the paper and the field experiments is available at https://youtu.be/l8Su8OXsM-E.

ROMar 24Code
Tightly-Coupled Radar-Visual-Inertial Odometry

Morten Nissov, Mohit Singh, Kostas Alexis

Visual-Inertial Odometry (VIO) is a staple for reliable state estimation on constrained and lightweight platforms due to its versatility and demonstrated performance. However, pertinent challenges regarding robust operation in dark, low-texture, obscured environments complicate the use of such methods. Alternatively, Frequency Modulated Continuous Wave (FMCW) radars, and by extension Radar-Inertial Odometry (RIO), offer robustness to these visual challenges, albeit at the cost of reduced information density and worse long-term accuracy. To address these limitations, this work combines the two in a tightly coupled manner, enabling the resulting method to operate robustly regardless of environmental conditions or trajectory dynamics. The proposed method fuses image features, radar Doppler measurements, and Inertial Measurement Unit (IMU) measurements within an Iterated Extended Kalman Filter (IEKF) in real-time, with radar range data augmenting the visual feature depth initialization. The method is evaluated through flight experiments conducted in both indoor and outdoor environments, as well as through challenges to both exteroceptive modalities (such as darkness, fog, or fast flight), thoroughly demonstrating its robustness. The implementation of the proposed method is available at: https://github.com/ntnu-arl/radvio

CVSep 11, 2023
Task-driven Compression for Collision Encoding based on Depth Images

Mihir Kulkarni, Kostas Alexis

This paper contributes a novel learning-based method for aggressive task-driven compression of depth images and their encoding as images tailored to collision prediction for robotic systems. A novel 3D image processing methodology is proposed that accounts for the robot's size in order to appropriately "inflate" the obstacles represented in the depth image and thus obtain the distance that can be traversed by the robot in a collision-free manner along any given ray within the camera frustum. Such depth-and-collision image pairs are used to train a neural network that follows the architecture of Variational Autoencoders to compress-and-transform the information in the original depth image to derive a latent representation that encodes the collision information for the given depth image. We compare our proposed task-driven encoding method with classical task-agnostic methods and demonstrate superior performance for the task of collision image prediction from extremely low-dimensional latent spaces. A set of comparative studies show that the proposed approach is capable of encoding depth image-and-collision image tuples from complex scenes with thin obstacles at long distances better than the classical methods at compression ratios as high as 4050:1.

ROMay 23
Towards Low-Gravity Planetary Exploration using Reinforcement Learning for Walking, Jumping, and In-flight Attitude Control

Jørgen Anker Olsen, Kostas Alexis

This paper presents reinforcement learning (RL) policies for dynamic quadrupedal locomotion in planetary exploration scenarios. Building on a taskoptimized quadruped with a 5-bar leg design, we develop RL policies for walking, vertical jumping, forward jumping, and in-flight attitude control, explicitly tailored to the reduced gravity on Mars. These policies jointly enable such robots to overcome obstacles larger than themselves through coordinated jumping and precise in-flight reorientation for safe landings. We demonstrate Sim2Real transfer of the attitude control policy on the Olympus quadruped through single-axis reorientation tests, while all locomotion policies are validated in simulation. A complete Mars exploration mission scenario demonstrates coordinated policy deployment across challenging terrain. Experimental results show 90° attitude reorientation in 2.6 seconds, with simulations demonstrating 3.1 meter vertical jumps and 3.9 meter forward jumps under Martian gravity conditions. - Supplementary video: https://www.youtube.com/watch?v=qlSJ3P87A4A

ROMay 3Code
On the Characterization and Limits of 4D Radar for Aided Inertial Navigation

Morten Nissov, Kostas Alexis

Frequency Modulated Continuous Wave (FMCW) radar is a promising sensor for aided inertial navigation, due to its robustness in environments that challenge traditional alternatives, such as LiDAR and vision. However, its widespread adoption is hindered by complex, noisy measurements, which make reliable estimation difficult. This manuscript addresses these challenges by analyzing the fundamental measurement relations of FMCW radar sensing and developing a reliable estimator. Noise models are derived by applying first principles to the underlying signal processing of a typical radar sensor. These models guide the design of a factor graph-based estimator, utilizing a first-order approximation for the measurement noise propagation. The approach is first examined through simulation, evaluating the significance of different noise sources, the validity of the first-order approximation, and the state-dependent nature of the covariance expressions. Extensive experiments demonstrate the superior robustness and accuracy of the proposed method across diverse field environments and flight profiles, including beyond the radar's standard operating range. Furthermore, the experiments confirm the insights from the simulation regarding the behavior and performance of different estimator configurations relative to their operating conditions. The evaluation data and estimator implementation are made available at https://github.com/ntnu-arl/rig.

ROSep 18, 2024
Online Refractive Camera Model Calibration in Visual Inertial Odometry

Mohit Singh, Kostas Alexis

This paper presents a general refractive camera model and online co-estimation of odometry and the refractive index of unknown media. This enables operation in diverse and varying refractive fluids, given only the camera calibration in air. The refractive index is estimated online as a state variable of a monocular visual-inertial odometry framework in an iterative formulation using the proposed camera model. The method was verified on data collected using an underwater robot traversing inside a pool. The evaluations demonstrate convergence to the ideal refractive index for water despite significant perturbations in the initialization. Simultaneously, the approach enables on-par visual-inertial odometry performance in refractive media without prior knowledge of the refractive index or requirement of medium-specific camera calibration.

ROApr 24Code
Equivariant Filter for Radar-Inertial Odometry

Giulio Delama, Jan Michalczyk, Morten Nissov et al.

Radar-Inertial Odometry (RIO) based on the Extended Kalman Filter (EKF) relies on accurate extrinsic calibration between the radar and the Inertial Measurement Unit (IMU) and is sensitive to disturbances, as large linearization errors can degrade performance or even cause divergence. To address these limitations, this letter proposes an Equivariant Filter (EqF) for RIO based on a Lie group symmetry that geometrically couples navigation states and IMU biases, extending it to incorporate radar-IMU extrinsic calibration and multi-state constraint updates. This equivariant formulation inherently preserves consistency and enhances robustness, enabling reliable state estimation even under poor or completely wrong initialization of calibration states. Real-world experiments on two different Uncrewed Aerial Vehicles (UAVs) show that the proposed EqF-RIO achieves state-of-the-art accuracy under correct extrinsic calibration and offers improved convergence under large calibration errors, where the conventional EKF-RIO fails. Evaluation code is open-sourced.

ROMar 23
Cross-Modal Reinforcement Learning for Navigation with Degraded Depth Measurements

Omkar Sawant, Luca Zanatta, Grzegorz Malczyk et al.

This paper presents a cross-modal learning framework that exploits complementary information from depth and grayscale images for robust navigation. We introduce a Cross-Modal Wasserstein Autoencoder that learns shared latent representations by enforcing cross-modal consistency, enabling the system to infer depth-relevant features from grayscale observations when depth measurements are corrupted. The learned representations are integrated with a Reinforcement Learning-based policy for collision-free navigation in unstructured environments when depth sensors experience degradation due to adverse conditions such as poor lighting or reflective surfaces. Simulation and real-world experiments demonstrate that our approach maintains robust performance under significant depth degradation and successfully transfers to real environments.

CVSep 23, 2024
Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer

Minh Bui, Kostas Alexis

Vision-based perception and reasoning is essential for scene understanding in any autonomous system. RGB and depth images are commonly used to capture both the semantic and geometric features of the environment. Developing methods to reliably interpret this data is critical for real-world applications, where noisy measurements are often unavoidable. In this work, we introduce a diffusion-based framework to address the RGB-D semantic segmentation problem. Additionally, we demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements. Our generative framework shows a greater capacity to model the underlying distribution of RGB-D images, achieving robust performance in challenging scenarios with significantly less training time compared to discriminative methods. Experimental results indicate that our approach achieves State-of-the-Art performance on both the NYUv2 and SUN-RGBD datasets in general and especially in the most challenging of their image data. Our project page will be available at https://diffusionmms.github.io/

RODec 6, 2020Code
Appendix for the Motion Primitives-based Path Planning for Fast and Agile Exploration Method

Mihir Dharmadhikari, Tung Dang, Kostas Alexis

This manuscript presents enhancements on our motion-primitives exploration path planning method for agile exploration using aerial robots. The method now further integrates a global planning layer to facilitate reliable large-scale exploration. The implemented bifurcation between local and global planning allows for efficient exploration combined with the ability to plan within very large environments, while also ensuring safe and timely return-to-home. A new set of simulation studies and experimental results are presented to demonstrate the new improvements and enhancements. The method is available open source as a Robot Operating System (ROS) package.

RONov 22, 2020Code
Model Predictive Control for Micro Aerial Vehicles: A Survey

Huan Nguyen, Mina Kamel, Kostas Alexis et al.

This paper presents a review of the design and application of model predictive control strategies for Micro Aerial Vehicles and specifically multirotor configurations such as quadrotors. The diverse set of works in the domain is organized based on the control law being optimized over linear or nonlinear dynamics, the integration of state and input constraints, possible fault-tolerant design, if reinforcement learning methods have been utilized and if the controller refers to free-flight or other tasks such as physical interaction or load transportation. A selected set of comparison results are also presented and serve to provide insight for the selection between linear and nonlinear schemes, the tuning of the prediction horizon, the importance of disturbance observer-based offset-free tracking and the intrinsic robustness of such methods to parameter uncertainty. Furthermore, an overview of recent research trends on the combined application of modern deep reinforcement learning techniques and model predictive control for multirotor vehicles is presented. Finally, this review concludes with explicit discussion regarding selected open-source software packages that deliver off-the-shelf model predictive control functionality applicable to a wide variety of Micro Aerial Vehicle configurations.

RODec 12, 2023
RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation

Pavel Petracek, Kostas Alexis, Martin Saska

The typical point cloud sampling methods used in state estimation for mobile robots preserve a high level of point redundancy. This redundancy unnecessarily slows down the estimation pipeline and may cause drift under real-time constraints. Such undue latency becomes a bottleneck for resource-constrained robots (especially UAVs), requiring minimal delay for agile and accurate operation. We propose a novel, deterministic, uninformed, and single-parameter point cloud sampling method named RMS that minimizes redundancy within a 3D point cloud. In contrast to the state of the art, RMS balances the translation-space observability by leveraging the fact that linear and planar surfaces inherently exhibit high redundancy propagated into iterative estimation pipelines. We define the concept of gradient flow, quantifying the local surface underlying a point. We also show that maximizing the entropy of the gradient flow minimizes point redundancy for robot ego-motion estimation. We integrate RMS into the point-based KISS-ICP and feature-based LOAM odometry pipelines and evaluate experimentally on KITTI, Hilti-Oxford, and custom datasets from multirotor UAVs. The experiments demonstrate that RMS outperforms state-of-the-art methods in speed, compression, and accuracy in well-conditioned as well as in geometrically-degenerated settings.

ROMar 5, 2025
Olympus: A Jumping Quadruped for Planetary Exploration Utilizing Reinforcement Learning for In-Flight Attitude Control

Jørgen Anker Olsen, Grzegorz Malczyk, Kostas Alexis

Exploring planetary bodies with lower gravity, such as the moon and Mars, allows legged robots to utilize jumping as an efficient form of locomotion thus giving them a valuable advantage over traditional rovers for exploration. Motivated by this fact, this paper presents the design, simulation, and learning-based "in-flight" attitude control of Olympus, a jumping legged robot tailored to the gravity of Mars. First, the design requirements are outlined followed by detailing how simulation enabled optimizing the robot's design - from its legs to the overall configuration - towards high vertical jumping, forward jumping distance, and in-flight attitude reorientation. Subsequently, the reinforcement learning policy used to track desired in-flight attitude maneuvers is presented. Successfully crossing the sim2real gap, extensive experimental studies of attitude reorientation tests are demonstrated.

ROMay 10, 2025
CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground Environments

Shehryar Khattak, Timon Homberger, Lukas Bernreiter et al.

Robot autonomy in unknown, GPS-denied, and complex underground environments requires real-time, robust, and accurate onboard pose estimation and mapping for reliable operations. This becomes particularly challenging in perception-degraded subterranean conditions under harsh environmental factors, including darkness, dust, and geometrically self-similar structures. This paper details CompSLAM, a highly resilient and hierarchical multi-modal localization and mapping framework designed to address these challenges. Its flexible architecture achieves resilience through redundancy by leveraging the complementary nature of pose estimates derived from diverse sensor modalities. Developed during the DARPA Subterranean Challenge, CompSLAM was successfully deployed on all aerial, legged, and wheeled robots of Team Cerberus during their competition-winning final run. Furthermore, it has proven to be a reliable odometry and mapping solution in various subsequent projects, with extensions enabling multi-robot map sharing for marsupial robotic deployments and collaborative mapping. This paper also introduces a comprehensive dataset acquired by a manually teleoperated quadrupedal robot, covering a significant portion of the DARPA Subterranean Challenge finals course. This dataset evaluates CompSLAM's robustness to sensor degradations as the robot traverses 740 meters in an environment characterized by highly variable geometries and demanding lighting conditions. The CompSLAM code and the DARPA SubT Finals dataset are made publicly available for the benefit of the robotics community

CVJun 3, 2025
Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

Fatma Youssef Mohammed, Kostas Alexis

Computational human attention modeling in free-viewing and task-specific settings is often studied separately, with limited exploration of whether a common representation exists between them. This work investigates this question and proposes a neural network architecture that builds upon the Human Attention transformer (HAT) to test the hypothesis. Our results demonstrate that free-viewing and visual search can efficiently share a common representation, allowing a model trained in free-viewing attention to transfer its knowledge to task-driven visual search with a performance drop of only 3.86% in the predicted fixation scanpaths, measured by the semantic sequence score (SemSS) metric which reflects the similarity between predicted and human scanpaths. This transfer reduces computational costs by 92.29% in terms of GFLOPs and 31.23% in terms of trainable parameters.

ROFeb 22, 2022
RMF-Owl: A Collision-Tolerant Flying Robot for Autonomous Subterranean Exploration

Paolo De Petris, Huan Nguyen, Mihir Dharmadhikari et al.

This work presents the design, hardware realization, autonomous exploration and object detection capabilities of RMF-Owl, a new collision-tolerant aerial robot tailored for resilient autonomous subterranean exploration. The system is custom built for underground exploration with focus on collision tolerance, resilient autonomy with robust localization and mapping, alongside high-performance exploration path planning in confined, obstacle-filled and topologically complex underground environments. Moreover, RMF-Owl offers the ability to search, detect and locate objects of interest which can be particularly useful in search and rescue missions. A series of results from field experiments are presented in order to demonstrate the system's ability to autonomously explore challenging unknown underground environments.

ROJan 18, 2022
CERBERUS: Autonomous Legged and Aerial Robotic Exploration in the Tunnel and Urban Circuits of the DARPA Subterranean Challenge

Marco Tranzatto, Frank Mascarich, Lukas Bernreiter et al.

Autonomous exploration of subterranean environments constitutes a major frontier for robotic systems as underground settings present key challenges that can render robot autonomy hard to achieve. This has motivated the DARPA Subterranean Challenge, where teams of robots search for objects of interest in various underground environments. In response, the CERBERUS system-of-systems is presented as a unified strategy towards subterranean exploration using legged and flying robots. As primary robots, ANYmal quadruped systems are deployed considering their endurance and potential to traverse challenging terrain. For aerial robots, both conventional and collision-tolerant multirotors are utilized to explore spaces too narrow or otherwise unreachable by ground systems. Anticipating degraded sensing conditions, a complementary multi-modal sensor fusion approach utilizing camera, LiDAR, and inertial data for resilient robot pose estimation is proposed. Individual robot pose estimates are refined by a centralized multi-robot map optimization approach to improve the reported location accuracy of detected objects of interest in the DARPA-defined coordinate frame. Furthermore, a unified exploration path planning policy is presented to facilitate the autonomous operation of both legged and aerial robots in complex underground networks. Finally, to enable communication between the robots and the base station, CERBERUS utilizes a ground rover with a high-gain antenna and an optical fiber connection to the base station, alongside breadcrumbing of wireless nodes by our legged robots. We report results from the CERBERUS system-of-systems deployment at the DARPA Subterranean Challenge Tunnel and Urban Circuits, along with the current limitations and the lessons learned for the benefit of the community.

ROJan 10, 2022
Motion Primitives-based Navigation Planning using Deep Collision Prediction

Huan Nguyen, Sondre Holm Fyhn, Paolo De Petris et al.

This paper contributes a method to design a novel navigation planner exploiting a learning-based collision prediction network. The neural network is tasked to predict the collision cost of each action sequence in a predefined motion primitives library in the robot's velocity-steering angle space, given only the current depth image and the estimated linear and angular velocities of the robot. Furthermore, we account for the uncertainty of the robot's partial state by utilizing the Unscented Transform and the uncertainty of the neural network model by using Monte Carlo dropout. The uncertainty-aware collision cost is then combined with the goal direction given by a global planner in order to determine the best action sequence to execute in a receding horizon manner. To demonstrate the method, we develop a resilient small flying robot integrating lightweight sensing and computing resources. A set of simulation and experimental studies, including a field deployment, in both cluttered and perceptually-challenging environments is conducted to evaluate the quality of the prediction network and the performance of the proposed planner.

RONov 11, 2021
Autonomous Teamed Exploration of Subterranean Environments using Legged and Aerial Robots

Mihir Kulkarni, Mihir Dharmadhikari, Marco Tranzatto et al.

This paper presents a novel strategy for autonomous teamed exploration of subterranean environments using legged and aerial robots. Tailored to the fact that subterranean settings, such as cave networks and underground mines, often involve complex, large-scale and multi-branched topologies, while wireless communication within them can be particularly challenging, this work is structured around the synergy of an onboard exploration path planner that allows for resilient long-term autonomy, and a multi-robot coordination framework. The onboard path planner is unified across legged and flying robots and enables navigation in environments with steep slopes, and diverse geometries. When a communication link is available, each robot of the team shares submaps to a centralized location where a multi-robot coordination framework identifies global frontiers of the exploration space to inform each system about where it should re-position to best continue its mission. The strategy is verified through a field deployment inside an underground mine in Switzerland using a legged and a flying robot collectively exploring for 45 min, as well as a longer simulation study with three systems.

ROJun 1, 2021
Resource-aware Online Parameter Adaptation for Computationally-constrained Visual-Inertial Navigation Systems

Pranay Mathur, Nikhil Khedekar, Kostas Alexis

In this paper, a computational resources-aware parameter adaptation method for visual-inertial navigation systems is proposed with the goal of enabling the improved deployment of such algorithms on computationally constrained systems. Such a capacity can prove critical when employed on ultra-lightweight systems or alongside mission critical computationally expensive processes. To achieve this objective, the algorithm proposes selected changes in the vision front-end and optimization back-end of visual-inertial odometry algorithms, both prior to execution and in real-time based on an online profiling of available resources. The method also utilizes information from the motion dynamics experienced by the system to manipulate parameters online. The general policy is demonstrated on three established algorithms, namely S-MSCKF, VINS-Mono and OKVIS and has been verified experimentally on the EuRoC dataset. The proposed approach achieved comparable performance at a fraction of the original computational cost.

CVDec 29, 2020
Visual-Thermal Camera Dataset Release and Multi-Modal Alignment without Calibration Information

Frank Mascarich, Kostas Alexis

This report accompanies a dataset release on visual and thermal camera data and details a procedure followed to align such multi-modal camera frames in order to provide pixel-level correspondence between the two without using intrinsic or extrinsic calibration information. To achieve this goal we benefit from progress in the domain of multi-modal image alignment and specifically employ the Mattes Mutual Information Metric to guide the registration process. In the released dataset we release both the raw visual and thermal camera data, as well as the aligned frames, alongside calibration parameters with the goal to better facilitate the investigation on common local/global features across such multi-modal image streams.

CVJun 15, 2020
Anomalous Motion Detection on Highway Using Deep Learning

Harpreet Singh, Emily M. Hand, Kostas Alexis

Research in visual anomaly detection draws much interest due to its applications in surveillance. Common datasets for evaluation are constructed using a stationary camera overlooking a region of interest. Previous research has shown promising results in detecting spatial as well as temporal anomalies in these settings. The advent of self-driving cars provides an opportunity to apply visual anomaly detection in a more dynamic application yet no dataset exists in this type of environment. This paper presents a new anomaly detection dataset - the Highway Traffic Anomaly (HTA) dataset - for the problem of detecting anomalous traffic patterns from dash cam videos of vehicles on highways. We evaluate state-of-the-art deep learning anomaly detection models and propose novel variations to these methods. Our results show that state-of-the-art models built for settings with a stationary camera do not translate well to a more dynamic environment. The proposed variations to these SoTA methods show promising results on the new HTA dataset.

ROMay 9, 2020
Autonomous Aerial Robotic Surveying and Mapping with Application to Construction Operations

Huan Nguyen, Frank Mascarich, Tung Dang et al.

In this paper we present an overview of the methods and systems that give rise to a flying robotic system capable of autonomous inspection, surveying, comprehensive multi-modal mapping and inventory tracking of construction sites with high degree of systematicity. The robotic system can operate assuming either no prior knowledge of the environment or by integrating a prior model of it. In the first case, autonomous exploration is provided which returns a high fidelity $3\textrm{D}$ map associated with color and thermal vision information. In the second case, the prior model of the structure can be used to provide optimized and repetitive coverage paths. The robot delivers its mapping result autonomously, while simultaneously being able to detect and localize objects of interest thus supporting inventory tracking tasks. The system has been field verified in a collection of environments and has been tested inside a construction project related to public housing.

ROApr 6, 2020
Towards a Science of Resilient Robotic Autonomy

Kostas Alexis

This discussion paper aims to support the argument process for the need to develop a comprehensive science of resilient robotic autonomy. Resilience and its key characteristics relating to robustness, redundancy, and resourcefulness are discussed, followed by a selected - but not exhaustive - list of research themes and domains that are crucial to facilitate resilient autonomy. Last but not least, an outline of possible directions of a new and enhanced design paradigm in robotics is presented. This manuscript is intentionally short and abstract. It serves to open the discussion and raise questions. The answers will necessarily be found in the actual process of conducting research by the community and in the framework of introducing robotics in an ever increasing set of real-life use cases. Its current form is based on thoughts identified within the ongoing experience of conducting research for robotic systems to gain autonomy in certain types of extreme environments such as subterranean settings, nuclear facilities, agriculture areas, and long-term off-road deployments. The very context of this document will be subject to change and it will be iteratively revisited.

RONov 24, 2019
The Reconfigurable Aerial Robotic Chain: Shape and Motion Planning

Mihir Kulkarni, Huan Nguyen, Kostas Alexis

This paper presents the design concept, modeling and motion planning solution for the aerial robotic chain. This design represents a configurable robotic system of systems, consisting of multi-linked micro aerial vehicles that simultaneously presents the ability to cross narrow sections, morph its shape, ferry significant payloads, offer the potential of distributed sensing and processing, and allow system extendability. We contribute an approach to address the motion planning problem of such a connected robotic system of systems, making full use of its reconfigurable nature, to find collision free paths in a fast manner despite the increased number of degrees of freedom. The presented approach exploits a library of aerial robotic chain configurations, optimized either for cross-section size or sensor coverage, alongside a probabilistic strategy to sample random shape configurations that may be needed to facilitate continued collision-free navigation. Evaluation studies in simulation involve traversal of constrained and obstacle-laden environments, having narrow corridors and cross sections.

CVApr 16, 2019
Are State-of-the-art Visual Place Recognition Techniques any Good for Aerial Robotics?

Mubariz Zaffar, Ahmad Khaliq, Shoaib Ehsan et al.

Visual Place Recognition (VPR) has seen significant advances at the frontiers of matching performance and computational superiority over the past few years. However, these evaluations are performed for ground-based mobile platforms and cannot be generalized to aerial platforms. The degree of viewpoint variation experienced by aerial robots is complex, with their processing power and on-board memory limited by payload size and battery ratings. Therefore, in this paper, we collect $8$ state-of-the-art VPR techniques that have been previously evaluated for ground-based platforms and compare them on $2$ recently proposed aerial place recognition datasets with three prime focuses: a) Matching performance b) Processing power consumption c) Projected memory requirements. This gives a birds-eye view of the applicability of contemporary VPR research to aerial robotics and lays down the the nature of challenges for aerial-VPR.

ROMar 5, 2019
Vision-Depth Landmarks and Inertial Fusion for Navigation in Degraded Visual Environments

Shehryar Khattak, Christos Papachristos, Kostas Alexis

This paper proposes a method for tight fusion of visual, depth and inertial data in order to extend robotic capabilities for navigation in GPS-denied, poorly illuminated, and texture-less environments. Visual and depth information are fused at the feature detection and descriptor extraction levels to augment one sensing modality with the other. These multimodal features are then further integrated with inertial sensor cues using an extended Kalman filter to estimate the robot pose, sensor bias terms, and landmark positions simultaneously as part of the filter state. As demonstrated through a set of hand-held and Micro Aerial Vehicle experiments, the proposed algorithm is shown to perform reliably in challenging visually-degraded environments using RGB-D information from a lightweight and low-cost sensor and data from an IMU.

ROMar 5, 2019
Visual-Thermal Landmarks and Inertial Fusion for Navigation in Degraded Visual Environments

Shehryar Khattak, Christos Papachristos, Kostas Alexis

With an ever-widening domain of aerial robotic applications, including many mission critical tasks such as disaster response operations, search and rescue missions and infrastructure inspections taking place in GPS-denied environments, the need for reliable autonomous operation of aerial robots has become crucial. Operating in GPS-denied areas aerial robots rely on a multitude of sensors to localize and navigate. Visible spectrum cameras are the most commonly used sensors due to their low cost and weight. However, in environments that are visually-degraded such as in conditions of poor illumination, low texture, or presence of obscurants including fog, smoke and dust, the reliability of visible light cameras deteriorates significantly. Nevertheless, maintaining reliable robot navigation in such conditions is essential. In contrast to visible light cameras, thermal cameras offer visibility in the infrared spectrum and can be used in a complementary manner with visible spectrum cameras for robot localization and navigation tasks, without paying the significant weight and power penalty typically associated with carrying other sensors. Exploiting this fact, in this work we present a multi-sensor fusion algorithm for reliable odometry estimation in GPS-denied and degraded visual environments. The proposed method utilizes information from both the visible and thermal spectra for landmark selection and prioritizes feature extraction from informative image regions based on a metric over spatial entropy. Furthermore, inertial sensing cues are integrated to improve the robustness of the odometry estimation process. To verify our solution, a set of challenging experiments were conducted inside a) an obscurant filed machine shop-like industrial environment, as well as b) a dark subterranean mine in the presence of heavy airborne dust.

ROMar 3, 2019
Keyframe-based Direct Thermal-Inertial Odometry

Shehryar Khattak, Christos Papachristos, Kostas Alexis

This paper proposes an approach for fusing direct radiometric data from a thermal camera with inertial measurements to extend the robotic capabilities of aerial robots for navigation in GPS-denied and visually degraded environments in the conditions of darkness and in the presence of airborne obscurants such as dust, fog and smoke. An optimization based approach is developed that jointly minimizes the re-projection error of 3D landmarks and inertial measurement errors. The developed solution is extensively verified against both ground-truth in an indoor laboratory setting, as well as inside an underground mine under severely visually degraded conditions.

ROMar 2, 2019
Marker based Thermal-Inertial Localization for Aerial Robots in Obscurant Filled Environments

Shehryar Khattak, Christos Papachristos, Kostas Alexis

For robotic inspection tasks in known environments fiducial markers provide a reliable and low-cost solution for robot localization. However, detection of such markers relies on the quality of RGB camera data, which degrades significantly in the presence of visual obscurants such as fog and smoke. The ability to navigate known environments in the presence of obscurants can be critical for inspection tasks especially, in the aftermath of a disaster. Addressing such a scenario, this work proposes a method for the design of fiducial markers to be used with thermal cameras for the pose estimation of aerial robots. Our low cost markers are designed to work in the long wave infrared spectrum, which is not affected by the presence of obscurants, and can be affixed to any object that has measurable temperature difference with respect to its surroundings. Furthermore, the estimated pose from the fiducial markers is fused with inertial measurements in an extended Kalman filter to remove high frequency noise and error present in the fiducial pose estimates. The proposed markers and the pose estimation method are experimentally evaluated in an obscurant filled environment using an aerial robot carrying a thermal camera.

RODec 12, 2018
Lévy Flight Foraging Hypothesis-based Autonomous Memoryless Search Under Sparse Rewards

Christos Papachristos, Kostas Alexis

Autonomous robots are commonly tasked with the problem of area exploration and search for certain targets or artifacts of interest to be tracked. Traditionally, the problem formulation considered is that of complete search and thus - ideally - identification of all targets of interest. An important problem however which is not often addressed is that of time-efficient memoryless search under sparse rewards that may be worth visited any number of items. In this paper we specifically address the largely understudied problem of optimizing the "time-of-arrival" or "time-of-detection" to robotically search for sparsely distributed rewards (detect targets of interest) within large-scale environments and subject to memoryless exploration. At the core of the proposed solution is the fact that a search-based Lévy walk consisting of a constant velocity search following a Lévy flight path is optimal for searching sparse and randomly distributed target regions in the lack of map memory. A set of results accompany the presentation of the method, demonstrate its properties and justify the purpose of its use towards large-scale area exploration autonomy.

RONov 16, 2018
Optical Flow Based Background Subtraction with a Moving Camera: Application to Autonomous Driving

Sotirios Diamantas, Kostas Alexis

In this research we present a novel algorithm for background subtraction using a moving camera. Our algorithm is based purely on visual information obtained from a camera mounted on an electric bus, operating in downtown Reno which automatically detects moving objects of interest with the view to provide a fully autonomous vehicle. In our approach we exploit the optical flow vectors generated by the motion of the camera while keeping parameter assumptions a minimum. At first, we estimate the Focus of Expansion, which is used to model and simulate 3D points given the intrinsic parameters of the camera, and perform multiple linear regression to estimate the regression equation parameters and implement on the real data set of every frame to identify moving objects. We validated our algorithm using data taken from a common bus route.

ROApr 11, 2018
Design and Control of an Aerial Manipulator for Contact-based Inspection

Varun Nayak, Christos Papachristos, Kostas Alexis

Manipulator dynamics, external forces and moments raise issues in stability and efficient control during aerial manipulation. Additionally, multirotor Micro Aerial Vehicles impose stringent limits on payload, actuation and system states. In view of these challenges, this work addressed the design and control of a 3-DoF serial aerial manipulator for contact inspection. A lightweight design with sufficient dexterous workspace for NDT (Non-Destructive Testing) inspection is presented. This operation requires the regulation of normal force on the inspected point. Contact dynamics have been discussed along with a simulation of the closed-loop dynamics during contact. The simulated controller preserves inherent system nonlinearities and uses a passivity approach to ensure the convergence of error to zero. A transition scheme from free-flight to contact was developed along with the hardware and software frameworks for implementation. This paper concludes with important drawbacks and prospects.

ROJan 24, 2018
Visual-Inertial Odometry-enhanced Geometrically Stable ICP for Mapping Applications using Aerial Robots

Tung Dang, Shehryar Khattak, Christos Papachristos et al.

This paper presents a visual-inertial odometry-enhanced geometrically stable Iterative Closest Point (ICP) algorithm for accurate mapping using aerial robots. The proposed method employs a visual-inertial odometry framework in order to provide robust priors to the ICP step and calculate the overlap among point clouds derived from an onboard time-of-flight depth sensor. Within the overlapping parts of the point clouds, the method samples points such that the distribution of normals among them is as large as possible. As different geometries and sensor trajectories will influence the performance of the alignment process, evaluation of the expected geometric stability of the ICP step is conducted. It is only when this test is successful that the matching, outlier rejection, and minimization of the error metric ICP steps are conducted and the new relative translation and rotational components are estimated, otherwise the system relies on the visual-inertial odometry transformation estimates. The proposed strategy was evaluated within handheld, automated and fully autonomous exploration and mapping missions using a small aerial robot and was shown to provide robust results of superior quality at an affordable increase of the computational load.

RONov 28, 2017
Aerial Drop of Robots and Sensors for Optimal Area Coverage

Kostas Alexis

The problem of rapid optimal coverage through the distribution a team of robots or static sensors via means of aerial drop is the topic of this work. Considering a nonholonomic (fixed-wing) aerial robot that corresponds to the carrier of a set of small holonomic (rotorcraft) aerial robots as well as static modules that are all equipped with a camera sensor, we address the problem of selecting optimal aerial drop times and configurations while the motion capabilities of the small aerial robots are also exploited to further survey their area of responsibility until they hit the ground. The overall solution framework consists of lightweight path-planning algorithms that can run on virtually any processing unit that might be available on-board. Evaluation studies in simulation as well as a set of elementary experiments that prove the validity of important assumptions illustrate the potential of the approach.

ROMay 18, 2017
Towards Robotically Supported Decommissioning of Nuclear Sites

Frank Mascarich, Taylor Wilson, Tung Dang et al.

This paper overviews certain radiation detection, perception, and planning challenges for nuclearized robotics that aim to support the waste management and decommissioning mission. To enable the autonomous monitoring, inspection and multi-modal characterization of nuclear sites, we discuss important problems relevant to the tasks of navigation in degraded visual environments, localizability-aware exploration and mapping without any prior knowledge of the environment, as well as robotic radiation detection. Future contributions will focus on each of the relevant problems, will aim to deliver a comprehensive multi-modal mapping result, and will emphasize on extensive field evaluation and system verification.

ROMar 8, 2017
Realizing the Aerial Robotic Worker for Inspection Operations

Kostas Alexis

This report overviews a set of recent contributions in the field of path planning that were developed to enable the realization of the autonomous aerial robotic worker for inspection operations. The specific algorithmic contributions address several fundamental challenges of robotic inspection and exploration, and specifically those of optimal coverage planning given an a priori known model of the structure to be inspected, full coverage, optimized and fast inspection path planning, as well as efficient exploration of completely unknown environments and structures. All of the developed path planners support both holonomic and nonholonomic systems, and respect the on-board sensor model and constraints. An overview of the achieved results, followed by an integrating architecture in order to enable fully autonomous and highly-efficient infrastructure inspection in both known and unknown environments.

RODec 30, 2016
Technical Report: Optimal Surveillance of Dynamic Parades using Teams of Aerial Robots

Kostas Alexis

This technical report addresses the problem of optimal surveillance of the route followed by a dynamic parade using a team of aerial robots. The dynamic parade is considered to take place within an urban environment, it is discretized and at every iteration, the algorithm computes the best possible placing of the aerial robotic team members, subject to their camera model and the occlusions arising from the environment. As the parade route is only as well covered as its least-covered point, the optimization objective is to place the aerial robots such that they maximize the minimum coverage over the points in the route at every time instant of it. A set of simulation studies is used to demonstrate the operation and performance characteristics of the approach, while computational analysis is also provided and verifies the good scalability properties of the contributed algorithm regarding the size of the aerial robotics team.

RODec 25, 2016
Distributed Infrastructure Inspection Path Planning subject to Time Constraints

Kostas Alexis, Christos Papachristos, Roland Siegwart et al.

Within this paper, the problem of 3D structural inspection path planning for distributed infrastructure using aerial robots that are subject to time constraints is addressed. The proposed algorithm handles varying spatial properties of the infrastructure facilities, accounts for their different importance and exploration function and computes an overall inspection path of high inspection reward while respecting the robot endurance or mission time constraints as well as the vehicle dynamics and sensor limitations. To achieve its goal, it employs an iterative, 3-step optimization strategy at each iteration of which it first randomly samples a set of possible structures to visit, subsequently solves the derived traveling salesman problem and computes the travel costs, while finally it samples and assigns inspection times to each structure and evaluates the total inspection reward. For the derivation of the inspection paths per each independent facility, it interfaces a path planner dedicated to the 3D coverage of single structures. The resulting algorithm properties, computational performance and path quality are evaluated using simulation studies as well as experimental test-cases employing a multirotor micro aerial vehicle.