Tomi Westerlund

h-index37

36papers

1,717citations

Novelty37%

AI Score55

Ranked #26,426 of 201,018 authors (top 13%)#598 in RO (top 8%)

36 Papers

ROMay 27Code

SAFEVPR: Patch-Based Conformal Verification for Safe Cross-Condition Sequence Visual Place Recognition

Ha Sier, Jiaqiang Zhang, Zhuo Zou et al.

Sequence-based visual place recognition (VPR) for SLAM and robot relocalization must decide whether the retrieved top-1 candidate is safe to accept. Conformal prediction is a natural framework for this accept/reject decision, but its finite-sample guarantees rely on exchangeability between calibration and deployment (test) data, which is violated under cross-condition deployment. We introduce SAFEVPR, a non-trainable verification-and-calibration pipeline for safe cross-condition sequence VPR. SAFEVPR replaces the standard backbone cosine similarity with a mutual-nearest-neighbour (MNN) patch-matching score computed from frozen DINOv2 ViT features, and replaces flat Learn-Then-Test calibration with Mondrian conformal LTT, fitting separate Bonferroni-corrected thresholds across score bins. Under exchangeability, these thresholds would provide finite-sample false-discovery-rate (FDR) control; under condition shift, we evaluate empirical validity per deployment. Across 23 cross-condition setups from Oxford RobotCar, NCLT, and St Lucia datasets, using three frozen VPR backbones, SAFEVPR is empirically valid on 23/23 setups at target FDR alpha = 0.10, achieving mean accepted FDR 0.014 and mean true-positive rate (TPR) 0.75. The results show that raw discrimination alone is not sufficient for conformal validity: AnyLoc-VLAD and Super-Point+LightGlue reach comparable area under the receiver operating characteristic curve (AUROC) but fail more setups under the same calibration. On textureless repetitive scenery, SAFEVPR safely abstains rather than accepting unreliable matches. Code is available at https://github.com/Hasar12139/SafeVPR.

ROMay 27

Degradation-Aware Cooperative Multi-Modal GNSS-Denied Localization Leveraging LiDAR-Based Robot Detections

Václav Pritzl, Xianjia Yu, Tomi Westerlund et al.

Accurate long-term localization using onboard sensors is crucial for robots operating in Global Navigation Satellite System (GNSS)-denied environments. While complementary sensors mitigate individual degradations, carrying all the available sensor types on a single robot significantly increases the size, weight, and power demands. Distributing sensors across multiple robots enhances the deployability but introduces challenges in fusing asynchronous, multi-modal data from independently moving platforms. We propose a novel adaptive multi-modal multi-robot cooperative localization approach using a factor-graph formulation to fuse asynchronous Visual-Inertial Odometry (VIO), LiDAR-Inertial Odometry (LIO), and 3D inter-robot detections from distinct robots in a loosely-coupled fashion. The approach adapts to changing conditions, leveraging reliable data to assist robots affected by sensory degradations. A novel interpolation-based factor enables fusion of the unsynchronized measurements. LIO degradations are evaluated based on the approximate scan-matching Hessian. A novel approach of weighting odometry data proportionally to the Wasserstein distance between the consecutive VIO outputs is proposed. A theoretical analysis is provided, investigating the cooperative localization problem under various conditions, mainly in the presence of sensory degradations. The proposed method has been extensively evaluated on real-world data gathered with heterogeneous teams of an Unmanned Ground Vehicle (UGV) and Unmanned Aerial Vehicles (UAVs), showing that the approach provides significant improvements in localization accuracy in the presence of various sensory degradations.

ROJun 1

A Simulation Platform for Flapping-Wing Vehicles

Haichuan Li, Tomi Westerlund

Flapping-wing aerial vehicles (FWAVs) demonstrate remarkable agility but face substantial autonomy challenges due to their high sensitivity to aerodynamic disturbances and limited sensor payload capacity. Current simulation platforms typically rely on oversimplified laminar flow assumptions and idealized sensor models, failing to capture the complex turbulence patterns and perceptual limitations encountered in real-world operation. This simulation-to-reality discrepancy significantly impedes the development of robust autonomy systems for FWAVs. We introduce FWAV-Sim, a high-fidelity Unity-based simulation framework that integrates: (1) a composite aerodynamic model combining quasi-steady blade-element theory with bluff-body drag effects, (2) spatiotemporally correlated turbulence generation through fractal noise synthesis, and (3) realistic sensor simulation including noisy IMU measurements, LiDAR point clouds, and RGB camera feeds. Our platform enables scalable generation of synchronized datasets containing ground-truth vehicle states, aerodynamic forces, turbulent wind fields, and multi-modal sensor streams. Experimental validation demonstrates that autonomy pipelines (including both controllers and perception systems) developed in FWAV-Sim exhibit significantly improved simulation capability, thereby advancing the outstanding performance in simulation-based development for flapping-wing aerial systems.

CVAug 26, 2022

Self-Calibrating Anomaly and Change Detection for Autonomous Inspection Robots

Sahar Salimpour, Jorge Peña Queralta, Tomi Westerlund

Automatic detection of visual anomalies and changes in the environment has been a topic of recurrent attention in the fields of machine learning and computer vision over the past decades. A visual anomaly or change detection algorithm identifies regions of an image that differ from a reference image or dataset. The majority of existing approaches focus on anomaly or fault detection in a specific class of images or environments, while general purpose visual anomaly detection algorithms are more scarce in the literature. In this paper, we propose a comprehensive deep learning framework for detecting anomalies and changes in a priori unknown environments after a reference dataset is gathered, and without need for retraining the model. We use the SuperPoint and SuperGlue feature extraction and matching methods to detect anomalies based on reference images taken from a similar location and with partial overlapping of the field of view. We also introduce a self-calibrating method for the proposed model in order to address the problem of sensitivity to feature matching thresholds and environmental conditions. To evaluate the proposed framework, we have used a ground robot system for the purpose of reference and query data collection. We show that high accuracy can be obtained using the proposed method. We also show that the calibration process enhances changes and foreign object detection performance

ROMar 8, 2022

Analyzing General-Purpose Deep-Learning Detection and Segmentation Models with Images from a Lidar as a Camera Sensor

Yu Xianjia, Sahar Salimpour, Jorge Peña Queralta et al.

Over the last decade, robotic perception algorithms have significantly benefited from the rapid advances in deep learning (DL). Indeed, a significant amount of the autonomy stack of different commercial and research platforms relies on DL for situational awareness, especially vision sensors. This work explores the potential of general-purpose DL perception algorithms, specifically detection and segmentation neural networks, for processing image-like outputs of advanced lidar sensors. Rather than processing the three-dimensional point cloud data, this is, to the best of our knowledge, the first work to focus on low-resolution images with 360\textdegree field of view obtained with lidar sensors by encoding either depth, reflectivity, or near-infrared light in the image pixels. We show that with adequate preprocessing, general-purpose DL models can process these images, opening the door to their usage in environmental conditions where vision sensors present inherent limitations. We provide both a qualitative and quantitative analysis of the performance of a variety of neural network architectures. We believe that using DL models built for visual cameras offers significant advantages due to the much wider availability and maturity compared to point cloud-based perception.

CVMar 23

Riverine Land Cover Mapping through Semantic Segmentation of Multispectral Point Clouds

Sopitta Thurachen, Josef Taher, Matti Lehtomäki et al.

Accurate land cover mapping in riverine environments is essential for effective river management, ecological understanding, and geomorphic change monitoring. This study explores the use of Point Transformer v2 (PTv2), an advanced deep neural network architecture designed for point cloud data, for land cover mapping through semantic segmentation of multispectral LiDAR data in real-world riverine environments. We utilize the geometric and spectral information from the 3-channel LiDAR point cloud to map land cover classes, including sand, gravel, low vegetation, high vegetation, forest floor, and water. The PTv2 model was trained and evaluated on point cloud data from the Oulanka river in northern Finland using both geometry and spectral features. To improve the model's generalization in new riverine environments, we additionally investigate multi-dataset training that adds sparsely annotated data from an additional river dataset. Results demonstrated that using the full-feature configuration resulted in performance with a mean Intersection over Union (mIoU) of 0.950, significantly outperforming the geometry baseline. Other ablation studies revealed that intensity and reflectance features were the key for accurate land cover mapping. The multi-dataset training experiment showed improved generalization performance, suggesting potential for developing more robust models despite limited high-quality annotated data. Our work demonstrates the potential of applying transformer-based architectures to multispectral point clouds in riverine environments. The approach offers new capabilities for monitoring sediment transport and other river management applications.

RONov 7, 2025

Follow-Me in Micro-Mobility with End-to-End Imitation Learning

Sahar Salimpour, Iacopo Catalano, Tomi Westerlund et al.

Autonomous micro-mobility platforms face challenges from the perspective of the typical deployment environment: large indoor spaces or urban areas that are potentially crowded and highly dynamic. While social navigation algorithms have progressed significantly, optimizing user comfort and overall user experience over other typical metrics in robotics (e.g., time or distance traveled) is understudied. Specifically, these metrics are critical in commercial applications. In this paper, we show how imitation learning delivers smoother and overall better controllers, versus previously used manually-tuned controllers. We demonstrate how DAAV's autonomous wheelchair achieves state-of-the-art comfort in follow-me mode, in which it follows a human operator assisting persons with reduced mobility (PRM). This paper analyzes different neural network architectures for end-to-end control and demonstrates their usability in real-world production-level deployments.

ROJan 6, 2025Code

Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots

Sahar Salimpour, Jorge Peña-Queralta, Diego Paez-Granados et al.

Unprecedented agility and dexterous manipulation have been demonstrated with controllers based on deep reinforcement learning (RL), with a significant impact on legged and humanoid robots. Modern tooling and simulation platforms, such as NVIDIA Isaac Sim, have been enabling such advances. This article focuses on demonstrating the applications of Isaac in local planning and obstacle avoidance as one of the most fundamental ways in which a mobile robot interacts with its environments. Although there is extensive research on proprioception-based RL policies, the article highlights less standardized and reproducible approaches to exteroception. At the same time, the article aims to provide a base framework for end-to-end local navigation policies and how a custom robot can be trained in such simulation environment. We benchmark end-to-end policies with the state-of-the-art Nav2, navigation stack in Robot Operating System (ROS). We also cover the sim-to-real transfer process by demonstrating zero-shot transferability of policies trained in the Isaac simulator to real-world robots. This is further evidenced by the tests with different simulated robots, which show the generalization of the learned policy. Finally, the benchmarks demonstrate comparable performance to Nav2, opening the door to quick deployment of state-of-the-art end-to-end local planners for custom robot platforms, but importantly furthering the possibilities by expanding the state and action spaces or task definitions for more complex missions. Overall, with this article we introduce the most important steps, and aspects to consider, in deploying RL policies for local path planning and obstacle avoidance with Isaac Sim training, Gazebo testing, and ROS 2 for real-time inference in real robots. The code is available at https://github.com/sahars93/RL-Navigation.

LGNov 12, 2024Code

Dual-Criterion Model Aggregation in Federated Learning: Balancing Data Quantity and Quality

Haizhou Zhang, Xianjia Yu, Tomi Westerlund

Federated learning (FL) has become one of the key methods for privacy-preserving collaborative learning, as it enables the transfer of models without requiring local data exchange. Within the FL framework, an aggregation algorithm is recognized as one of the most crucial components for ensuring the efficacy and security of the system. Existing average aggregation algorithms typically assume that all client-trained data holds equal value or that weights are based solely on the quantity of data contributed by each client. In contrast, alternative approaches involve training the model locally after aggregation to enhance adaptability. However, these approaches fundamentally ignore the inherent heterogeneity between different clients' data and the complexity of variations in data at the aggregation stage, which may lead to a suboptimal global model. To address these issues, this study proposes a novel dual-criterion weighted aggregation algorithm involving the quantity and quality of data from the client node. Specifically, we quantify the data used for training and perform multiple rounds of local model inference accuracy evaluation on a specialized dataset to assess the data quality of each client. These two factors are utilized as weights within the aggregation process, applied through a dynamically weighted summation of these two factors. This approach allows the algorithm to adaptively adjust the weights, ensuring that every client can contribute to the global model, regardless of their data's size or initial quality. Our experiments show that the proposed algorithm outperforms several existing state-of-the-art aggregation approaches on both a general-purpose open-source dataset, CIFAR-10, and a dataset specific to visual obstacle avoidance.

ROMar 9, 2020Code

UWB-based system for UAV Localization in GNSS-Denied Environments: Characterization and Dataset

Jorge Peña Queralta, Carmen Martínez Almansa, Fabrizio Schiano et al.

Small unmanned aerial vehicles (UAV) have penetrated multiple domains over the past years. In GNSS-denied or indoor environments, aerial robots require a robust and stable localization system, often with external feedback, in order to fly safely. Motion capture systems are typically utilized indoors when accurate localization is needed. However, these systems are expensive and most require a fixed setup. Recently, visual-inertial odometry and similar methods have advanced to a point where autonomous UAVs can rely on them for localization. The main limitation in this case comes from the environment, as well as in long-term autonomy due to accumulating error if loop closure cannot be performed efficiently. For instance, the impact of low visibility due to dust or smoke in post-disaster scenarios might render the odometry methods inapplicable. In this paper, we study and characterize an ultra-wideband (UWB) system for navigation and localization of aerial robots indoors based on Decawave's DWM1001 UWB node. The system is portable, inexpensive and can be battery powered in its totality. We show the viability of this system for autonomous flight of UAVs, and provide open-source methods and data that enable its widespread application even with movable anchor systems. We characterize the accuracy based on the position of the UAV with respect to the anchors, its altitude and speed, and the distribution of the anchors in space. Finally, we analyze the accuracy of the self-calibration of the anchors' positions.

CVOct 20, 2024

Event-based Sensor Fusion and Application on Odometry: A Survey

Jiaqiang Zhang, Xianjia Yu, Ha Sier et al.

Event cameras, inspired by biological vision, are asynchronous sensors that detect changes in brightness, offering notable advantages in environments characterized by high-speed motion, low lighting, or wide dynamic range. These distinctive properties render event cameras particularly effective for sensor fusion in robotics and computer vision, especially in enhancing traditional visual or LiDAR-inertial odometry. Conventional frame-based cameras suffer from limitations such as motion blur and drift, which can be mitigated by the continuous, low-latency data provided by event cameras. Similarly, LiDAR-based odometry encounters challenges related to the loss of geometric information in environments such as corridors. To address these limitations, unlike the existing event camera-related surveys, this paper presents a comprehensive overview of recent advancements in event-based sensor fusion for odometry applications particularly, investigating fusion strategies that incorporate frame-based cameras, inertial measurement units (IMUs), and LiDAR. The survey critically assesses the contributions of these fusion methods to improving odometry performance in complex environments, while highlighting key applications, and discussing the strengths, limitations, and unresolved challenges. Additionally, it offers insights into potential future research directions to advance event-based sensor fusion for next-generation odometry applications.

CVJul 25, 2025

Co-Win: Joint Object Detection and Instance Segmentation in LiDAR Point Clouds via Collaborative Window Processing

Haichuan Li, Tomi Westerlund

Accurate perception and scene understanding in complex urban environments is a critical challenge for ensuring safe and efficient autonomous navigation. In this paper, we present Co-Win, a novel bird's eye view (BEV) perception framework that integrates point cloud encoding with efficient parallel window-based feature extraction to address the multi-modality inherent in environmental understanding. Our method employs a hierarchical architecture comprising a specialized encoder, a window-based backbone, and a query-based decoder head to effectively capture diverse spatial features and object relationships. Unlike prior approaches that treat perception as a simple regression task, our framework incorporates a variational approach with mask-based instance segmentation, enabling fine-grained scene decomposition and understanding. The Co-Win architecture processes point cloud data through progressive feature extraction stages, ensuring that predicted masks are both data-consistent and contextually relevant. Furthermore, our method produces interpretable and diverse instance predictions, enabling enhanced downstream decision-making and planning in autonomous driving systems.

ROJul 23, 2025

IndoorBEV: Joint Detection and Footprint Completion of Objects via Mask-based Prediction in Indoor Scenarios for Bird's-Eye View Perception

Haichuan Li, Changda Tian, Panos Trahanias et al.

Detecting diverse objects within complex indoor 3D point clouds presents significant challenges for robotic perception, particularly with varied object shapes, clutter, and the co-existence of static and dynamic elements where traditional bounding box methods falter. To address these limitations, we propose IndoorBEV, a novel mask-based Bird's-Eye View (BEV) method for indoor mobile robots. In a BEV method, a 3D scene is projected into a 2D BEV grid which handles naturally occlusions and provides a consistent top-down view aiding to distinguish static obstacles from dynamic agents. The obtained 2D BEV results is directly usable to downstream robotic tasks like navigation, motion prediction, and planning. Our architecture utilizes an axis compact encoder and a window-based backbone to extract rich spatial features from this BEV map. A query-based decoder head then employs learned object queries to concurrently predict object classes and instance masks in the BEV space. This mask-centric formulation effectively captures the footprint of both static and dynamic objects regardless of their shape, offering a robust alternative to bounding box regression. We demonstrate the effectiveness of IndoorBEV on a custom indoor dataset featuring diverse object classes including static objects and dynamic elements like robots and miscellaneous items, showcasing its potential for robust indoor scene understanding.

CRMay 31, 2025

Blockchain Powered Edge Intelligence for U-Healthcare in Privacy Critical and Time Sensitive Environment

Anum Nawaz, Hafiz Humza Mahmood Ramzan, Xianjia Yu et al.

Edge Intelligence (EI) serves as a critical enabler for privacy-preserving systems by providing AI-empowered computation and distributed caching services at the edge, thereby minimizing latency and enhancing data privacy. The integration of blockchain technology further augments EI frameworks by ensuring transactional transparency, auditability, and system-wide reliability through a decentralized network model. However, the operational architecture of such systems introduces inherent vulnerabilities, particularly due to the extensive data interactions between edge gateways (EGs) and the distributed nature of information storage during service provisioning. To address these challenges, we propose an autonomous computing model along with its interaction topologies tailored for privacy-critical and time-sensitive health applications. The system supports continuous monitoring, real-time alert notifications, disease detection, and robust data processing and aggregation. It also includes a data transaction handler and mechanisms for ensuring privacy at the EGs. Moreover, a resource-efficient one-dimensional convolutional neural network (1D-CNN) is proposed for the multiclass classification of arrhythmia, enabling accurate and real-time analysis of constrained EGs. Furthermore, a secure access scheme is defined to manage both off-chain and on-chain data sharing and storage. To validate the proposed model, comprehensive security, performance, and cost analyses are conducted, demonstrating the efficiency and reliability of the fine-grained access control scheme.

LGMay 31, 2025

Blockchain-Enabled Privacy-Preserving Second-Order Federated Edge Learning in Personalized Healthcare

Anum Nawaz, Muhammad Irfan, Xianjia Yu et al.

Federated learning (FL) has attracted increasing attention to mitigate security and privacy challenges in traditional cloud-centric machine learning models specifically in healthcare ecosystems. FL methodologies enable the training of global models through localized policies, allowing independent operations at the edge clients' level. Conventional first-order FL approaches face several challenges in personalized model training due to heterogeneous non-independent and identically distributed (non-iid) data of each edge client. Recently, second-order FL approaches maintain the stability and consistency of non-iid datasets while improving personalized model training. This study proposes and develops a verifiable and auditable optimized second-order FL framework BFEL (blockchain-enhanced federated edge learning) based on optimized FedCurv for personalized healthcare systems. FedCurv incorporates information about the importance of each parameter to each client's task (through Fisher Information Matrix) which helps to preserve client-specific knowledge and reduce model drift during aggregation. Moreover, it minimizes communication rounds required to achieve a target precision convergence for each edge client while effectively managing personalized training on non-iid and heterogeneous data. The incorporation of Ethereum-based model aggregation ensures trust, verifiability, and auditability while public key encryption enhances privacy and security. Experimental results of federated CNNs and MLPs utilizing Mnist, Cifar-10, and PathMnist demonstrate the high efficiency and scalability of the proposed framework.

IVMay 8, 2025

Improved Brain Tumor Detection in MRI: Fuzzy Sigmoid Convolution in Deep Learning

Muhammad Irfan, Anum Nawaz, Riku Klen et al.

Early detection and accurate diagnosis are essential to improving patient outcomes. The use of convolutional neural networks (CNNs) for tumor detection has shown promise, but existing models often suffer from overparameterization, which limits their performance gains. In this study, fuzzy sigmoid convolution (FSC) is introduced along with two additional modules: top-of-the-funnel and middle-of-the-funnel. The proposed methodology significantly reduces the number of trainable parameters without compromising classification accuracy. A novel convolutional operator is central to this approach, effectively dilating the receptive field while preserving input data integrity. This enables efficient feature map reduction and enhances the model's tumor detection capability. In the FSC-based model, fuzzy sigmoid activation functions are incorporated within convolutional layers to improve feature extraction and classification. The inclusion of fuzzy logic into the architecture improves its adaptability and robustness. Extensive experiments on three benchmark datasets demonstrate the superior performance and efficiency of the proposed model. The FSC-based architecture achieved classification accuracies of 99.17%, 99.75%, and 99.89% on three different datasets. The model employs 100 times fewer parameters than large-scale transfer learning architectures, highlighting its computational efficiency and suitability for detecting brain tumors early. This research offers lightweight, high-performance deep-learning models for medical imaging applications.

ROApr 20, 2021

An Overview of Federated Learning at the Edge and Distributed Ledger Technologies for Robotic and Autonomous Systems

Yu Xianjia, Jorge Peña Queralta, Jukka Heikkonen et al.

Autonomous systems are becoming inherently ubiquitous with the advancements of computing and communication solutions enabling low-latency offloading and real-time collaboration of distributed devices. Decentralized technologies with blockchain and distributed ledger technologies (DLTs) are playing a key role. At the same time, advances in deep learning (DL) have significantly raised the degree of autonomy and level of intelligence of robotic and autonomous systems. While these technological revolutions were taking place, raising concerns in terms of data security and end-user privacy has become an inescapable research consideration. Federated learning (FL) is a promising solution to privacy-preserving DL at the edge, with an inherently distributed nature by learning on isolated data islands and communicating only model updates. However, FL by itself does not provide the levels of security and robustness required by today's standards in distributed autonomous systems. This survey covers applications of FL to autonomous robots, analyzes the role of DLT and FL for these systems, and introduces the key background concepts and considerations in current research.

ROApr 1, 2021

Cooperative UWB-Based Localization for Outdoors Positioning and Navigation of UAVs aided by Ground Robots

Yu Xianjia, Li Qingqing, Jorge Pena Queralta et al.

Unmanned aerial vehicles (UAVs) are becoming largely ubiquitous with an increasing demand for aerial data. Accurate navigation and localization, required for precise data collection in many industrial applications, often relies on RTK GNSS. These systems, able of centimeter-level accuracy, require a setup and calibration process and are relatively expensive. This paper addresses the problem of accurate positioning and navigation of UAVs through cooperative localization. Inexpensive ultra-wideband (UWB) transceivers installed on both the UAV and a support ground robot enable centimeter-level relative positioning. With fast deployment and wide setup flexibility, the proposed system is able to accommodate different environments and can also be utilized in GNSS-denied environments. Through extensive simulations and test fields, we evaluate the accuracy of the system and compare it to GNSS in urban environments where multipath transmission degrades accuracy. For completeness, we include visual-inertial odometry in the experiments and compare the performance with the UWB-based cooperative localization.

ROMar 25, 2021

Multi Sensor Fusion for Navigation and Mapping in Autonomous Vehicles: Accurate Localization in Urban Environments

Li Qingqing, Jorge Peña Queralta, Tuan Nguyen Gia et al.

The combination of data from multiple sensors, also known as sensor fusion or data fusion, is a key aspect in the design of autonomous robots. In particular, algorithms able to accommodate sensor fusion techniques enable increased accuracy, and are more resilient against the malfunction of individual sensors. The development of algorithms for autonomous navigation, mapping and localization have seen big advancements over the past two decades. Nonetheless, challenges remain in developing robust solutions for accurate localization in dense urban environments, where the so called last-mile delivery occurs. In these scenarios, local motion estimation is combined with the matching of real-time data with a detailed pre-built map. In this paper, we utilize data gathered with an autonomous delivery robot to compare different sensor fusion techniques and evaluate which are the algorithms providing the highest accuracy depending on the environment. The techniques we analyze and propose in this paper utilize 3D lidar data, inertial data, GNSS data and wheel encoder readings. We show how lidar scan matching combined with other sensor data can be used to increase the accuracy of the robot localization and, in consequence, its navigation. Moreover, we propose a strategy to reduce the impact on navigation performance when a change in the environment renders map data invalid or part of the available map is corrupted.

ROMar 24, 2021

Applications of UWB Networks and Positioning to Autonomous Robots and Industrial Systems

Xianjia Yu, Qingqing Li, Jorge Peña Queralta et al.

Ultra-wideband (UWB) technology is a mature technology that contested other wireless technologies in the advent of the IoT but did not achieve the same levels of widespread adoption. In recent years, however, with its potential as a wireless ranging and localization solution, it has regained momentum. Within the robotics field, UWB positioning systems are being increasingly adopted for localizing autonomous ground or aerial robots. In the Industrial IoT (IIoT) domain, its potential for ad-hoc networking and simultaneous positioning is also being explored. This survey overviews the state-of-the-art in UWB networking and localization for robotic and autonomous systems. We also cover novel techniques focusing on more scalable systems, collaborative approaches to localization, ad-hoc networking, and solutions involving machine learning to improve accuracy. This is, to the best of our knowledge, the first survey to put together the robotics and IIoT perspectives and to emphasize novel ranging and positioning modalities. We complete the survey with a discussion on current trends and open research problems.

ROMar 11, 2021

Towards Large-Scale Scalable MAV Swarms with ROS2 and UWB-based Situated Communication

Jorge Peña Queralta, Yu Xianjia, Li Qingqing et al.

The design and development of swarms of micro-aerial vehicles (MAVs) has recently gained significant traction. Collaborative aerial swarms have potential applications in areas as diverse as surveillance and monitoring, inventory management, search and rescue, or in the entertainment industry. Swarm intelligence has, by definition, a distributed nature. Yet performing experiments in truly distributed systems is not always possible, as much of the underlying ecosystem employed requires some sort of central control. Indeed, in experimental proofs of concept, most research relies on more traditional connectivity solutions and centralized approaches. External localization solutions, such as motion capture (MOCAP) systems, visual markers, or ultra-wideband (UWB) anchors are often used. Alternatively, intra-swarm solutions are often limited in terms of, e.g., range or field-of-view. Research and development has been supported by platforms such as the e-puck, the kilobot, or the crazyflie quadrotors. We believe there is a need for inexpensive platforms such as the Crazyflie with more advanced onboard processing capabilities and sensors, while offering scalability and robust communication and localization solutions. In the following, we present a platform for research and development in aerial swarms currently under development, where we leverage Wi-Fi mesh connectivity and the distributed ROS2 middleware together with UWB ranging and communication for situated communication. We present a platform for building towards large-scale swarms of autonomous MAVs leveraging the ROS2 middleware, Wi-Fi mesh connectivity, and UWB ranging and communication. The platform is based on the Ryze Tello Drone, a Raspberry Pi Zero W as a companion computer together with a camera module, and a Decawave DWM1001 UWB module for ranging and basic communication.

ROMar 6, 2021

Adaptive Lidar Scan Frame Integration: Tracking Known MAVs in 3D Point Clouds

Li Qingqing, Yu Xianjia, Jorge Peña Queralta et al.

Micro-aerial vehicles (MAVs) are becoming ubiquitous across multiple industries and application domains. Lightweight MAVs with only an onboard flight controller and a minimal sensor suite (e.g., IMU, vision, and vertical ranging sensors) have potential as mobile and easily deployable sensing platforms. When deployed from a ground robot, a key parameter is a relative localization between the ground robot and the MAV. This paper proposes a novel method for tracking MAVs in lidar point clouds. In lidar point clouds, we consider the speed and distance of the MAV to actively adapt the lidar's frame integration time and, in essence, the density and size of the point cloud to be processed. We show that this method enables more persistent and robust tracking when the speed of the MAV or its distance to the tracking sensor changes. In addition, we propose a multi-modal tracking method that relies on high-frequency scans for accurate state estimation, lower-frequency scans for robust and persistent tracking, and sub-Hz processing for trajectory and object identification. These three integration and processing modalities allow for an overall accurate and robust MAV tracking while ensuring the object being tracked meets shape and size constraints.

RODec 31, 2020

Long-Term Autonomy in Forest Environment using Self-Corrective SLAM

Paavo Nevalainen, Parisa Movahedi, Jorge Peña Queralta et al.

Vehicles with prolonged autonomous missions have to maintain environment awareness by simultaneous localization and mapping (SLAM). Closed loop correction is substituted by interpolation in rigid body transformation space in order to systematically reduce the accumulated error over different scales. The computation is divided to an edge computed lightweight SLAM and iterative corrections in the cloud environment. Tree locations in the forest environment are sent via a potentially limited communication bandwidths. Data from a real forest site is used in the verification of the proposed algorithm. The algorithm adds new iterative closest point (ICP) cases to the initial SLAM and measures the resulting map quality by the mean of the root mean squared error (RMSE) of individual tree clusters. Adding 4% more match cases yields the mean RMSE 0.15 m on a large site with 180 m odometric distance.

RONov 2, 2020

VIO-UWB-Based Collaborative Localization and Dense Scene Reconstruction within Heterogeneous Multi-Robot Systems

Jorge Peña Queralta, Li Qingqing, Fabrizio Schiano et al.

Effective collaboration in multi-robot systems requires accurate and robust estimation of relative localization: from cooperative manipulation to collaborative sensing, and including cooperative exploration or cooperative transportation. This paper introduces a novel approach to collaborative localization for dense scene reconstruction in heterogeneous multi-robot systems comprising ground robots and micro-aerial vehicles (MAVs). We solve the problem of full relative pose estimation without sliding time windows by relying on UWB-based ranging and Visual Inertial Odometry (VIO)-based egomotion estimation for localization, while exploiting lidars onboard the ground robots for full relative pose estimation in a single reference frame. During operation, the rigidity eigenvalue provides feedback to the system. To tackle the challenge of path planning and obstacle avoidance of MAVs in GNSS-denied environments, we maintain line-of-sight between ground robots and MAVs. Because lidars capable of dense reconstruction have limited FoV, this introduces new constraints to the system. Therefore, we propose a novel formulation with a variant of the Dubins multiple traveling salesman problem with neighborhoods (DMTSPN) where we include constraints related to the limited FoV of the ground robots. Our approach is validated with simulations and experiments with real robots for the different parts of the system.

LGSep 24, 2020

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey

Wenshuai Zhao, Jorge Peña Queralta, Tomi Westerlund

Deep reinforcement learning has recently seen huge success across multiple areas in the robotics domain. Owing to the limitations of gathering real-world data, i.e., sample inefficiency and the cost of collecting it, simulation environments are utilized for training the different agents. This not only aids in providing a potentially infinite data source, but also alleviates safety concerns with real robots. Nonetheless, the gap between the simulated and real worlds degrades the performance of the policies once the models are transferred into real robots. Multiple research efforts are therefore now being directed towards closing this sim-to-real gap and accomplish more efficient policy transfer. Recent years have seen the emergence of multiple methods applicable to different domains, but there is a lack, to the best of our knowledge, of a comprehensive review summarizing and putting into context the different methods. In this survey paper, we cover the fundamental background behind sim-to-real transfer in deep reinforcement learning and overview the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation. We categorize some of the most relevant recent works, and outline the main application scenarios. Finally, we discuss the main opportunities and challenges of the different approaches and point to the most promising directions.

ROSep 2, 2020

Secure Encoded Instruction Graphs for End-to-End Data Validation in Autonomous Robots

Jorge Peña Queralta, Li Qingqing, Eduardo Castelló Ferrer et al.

As autonomous robots are becoming more widespread, more attention is being paid to the security of robotic operation. Autonomous robots can be seen as cyber-physical systems: they can operate in virtual, physical, and human realms. Therefore, securing the operations of autonomous robots requires not only securing their data (e.g., sensor inputs and mission instructions) but securing their interactions with their environment. There is currently a deficiency of methods that would allow robots to securely ensure their sensors and actuators are operating correctly without external feedback. This paper introduces an encoding method and end-to-end validation framework for the missions of autonomous robots. In particular, we present a proof of concept of a map encoding method, which allows robots to navigate realistic environments and validate operational instructions with almost zero {\it a priori} knowledge. We demonstrate our framework using two different encoded maps in experiments with simulated and real robots. Our encoded maps have the same advantages as typical landmark-based navigation, but with the added benefit of cryptographic hashes that enable end-to-end information validation. Our method is applicable to any aspect of robotic operation in which there is a predefined set of actions or instructions given to the robot.

ROAug 28, 2020

Collaborative Multi-Robot Systems for Search and Rescue: Coordination and Perception

Jorge Peña Queralta, Jussi Taipalmaa, Bilge Can Pullinen et al.

Autonomous or teleoperated robots have been playing increasingly important roles in civil applications in recent years. Across the different civil domains where robots can support human operators, one of the areas where they can have more impact is in search and rescue (SAR) operations. In particular, multi-robot systems have the potential to significantly improve the efficiency of SAR personnel with faster search of victims, initial assessment and mapping of the environment, real-time monitoring and surveillance of SAR operations, or establishing emergency communication networks, among other possibilities. SAR operations encompass a wide variety of environments and situations, and therefore heterogeneous and collaborative multi-robot systems can provide the most advantages. In this paper, we review and analyze the existing approaches to multi-robot SAR support, from an algorithmic perspective and putting an emphasis on the methods enabling collaboration among the robots as well as advanced perception through machine vision and multi-agent active perception. Furthermore, we put these algorithms in the context of the different challenges and constraints that various types of robots (ground, aerial, surface or underwater) encounter in different SAR environments (maritime, urban, wilderness or other post-disaster scenarios). This is, to the best of our knowledge, the first review considering heterogeneous SAR robots across different environments, while giving two complimentary points of view: control mechanisms and machine perception. Based on our review of the state-of-the-art, we discuss the main open research questions, and outline our insights on the current approaches that have potential to improve the real-world performance of multi-robot SAR systems.

LGAug 18, 2020

Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning

Wenshuai Zhao, Jorge Peña Queralta, Li Qingqing et al.

Current research directions in deep reinforcement learning include bridging the simulation-reality gap, improving sample efficiency of experiences in distributed multi-agent reinforcement learning, together with the development of robust methods against adversarial agents in distributed learning, among many others. In this work, we are particularly interested in analyzing how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems where the operation of the different robots is not necessarily homogeneous. These variations can happen due to sensing mismatches, inherent errors in terms of calibration of the mechanical joints, or simple differences in accuracy. While our results are simulation-based, we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning with proximal policy optimization (PPO). We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort. The simulations are carried out using a Kuka arm model in the Bullet physics engine. This is, to the best of our knowledge, the first work exploring the limitations of PPO in multi-robot systems when considering that different robots might be exposed to different environments where their sensors or actuators have induced errors. With the conclusions of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbances that might differ within a multi-robot system.

ROAug 18, 2020

Ubiquitous Distributed Deep Reinforcement Learning at the Edge: Analyzing Byzantine Agents in Discrete Action Spaces

Wenshuai Zhao, Jorge Peña Queralta, Li Qingqing et al.

The integration of edge computing in next-generation mobile networks is bringing low-latency and high-bandwidth ubiquitous connectivity to a myriad of cyber-physical systems. This will further boost the increasing intelligence that is being embedded at the edge in various types of autonomous systems, where collaborative machine learning has the potential to play a significant role. This paper discusses some of the challenges in multi-agent distributed deep reinforcement learning that can occur in the presence of byzantine or malfunctioning agents. As the simulation-to-reality gap gets bridged, the probability of malfunctions or errors must be taken into account. We show how wrong discrete actions can significantly affect the collaborative learning effort. In particular, we analyze the effect of having a fraction of agents that might perform the wrong action with a given probability. We study the ability of the system to converge towards a common working policy through the collaborative learning process based on the number of experiences from each of the agents to be aggregated for each policy update, together with the fraction of wrong actions from agents experiencing malfunctions. Our experiments are carried out in a simulation environment using the Atari testbed for the discrete action spaces, and advantage actor-critic (A2C) for the distributed multi-agent training.

ROMay 12, 2020

Localization in Unstructured Environments: Towards Autonomous Robots in Forests with Delaunay Triangulation

Qingqing Li, Paavo Nevalainen, Jorge Peña Queralta et al.

Autonomous harvesting and transportation is a long-term goal of the forest industry. One of the main challenges is the accurate localization of both vehicles and trees in a forest. Forests are unstructured environments where it is difficult to find a group of significant landmarks for current fast feature-based place recognition algorithms. This paper proposes a novel approach where local observations are matched to a general tree map using the Delaunay triangularization as the representation format. Instead of point cloud based matching methods, we utilize a topology-based method. First, tree trunk positions are registered at a prior run done by a forest harvester. Second, the resulting map is Delaunay triangularized. Third, a local submap of the autonomous robot is registered, triangularized and matched using triangular similarity maximization to estimate the position of the robot. We test our method on a dataset accumulated from a forestry site at Lieksa, Finland. A total length of 2100\,m of harvester path was recorded by an industrial harvester with a 3D laser scanner and a geolocation unit fixed to the frame. Our experiments show a 12\,cm s.t.d. in the location accuracy and with real-time data processing for speeds not exceeding 0.5\,m/s. The accuracy and speed limit is realistic during forest operations.

ROMay 7, 2020

AutoSOS: Towards Multi-UAV Systems Supporting Maritime Search and Rescue with Lightweight AI and Edge Computing

Jorge Peña Queralta, Jenni Raitoharju, Tuan Nguyen Gia et al.

Rescue vessels are the main actors in maritime safety and rescue operations. At the same time, aerial drones bring a significant advantage into this scenario. This paper presents the research directions of the AutoSOS project, where we work in the development of an autonomous multi-robot search and rescue assistance platform capable of sensor fusion and object detection in embedded devices using novel lightweight AI models. The platform is meant to perform reconnaissance missions for initial assessment of the environment using novel adaptive deep learning algorithms that efficiently use the available sensors and computational resources on drones and rescue vessel. When drones find potential objects, they will send their sensor data to the vessel to verity the findings with increased accuracy. The actual rescue and treatment operation are left as the responsibility of the rescue personnel. The drones will autonomously reconfigure their spatial distribution to enable multi-hop communication, when a direct connection between a drone transmitting information and the vessel is unavailable.

ROApr 29, 2020

End-to-End Design for Self-Reconfigurable Heterogeneous Robotic Swarms

Jorge Peña Queralta, Li Qingqing, Tuan Nguyen Gia et al.

More widespread adoption requires swarms of robots to be more flexible for real-world applications. Multiple challenges remain in complex scenarios where a large amount of data needs to be processed in real-time and high degrees of situational awareness are required. The options in this direction are limited in existing robotic swarms, mostly homogeneous robots with limited operational and reconfiguration flexibility. We address this by bringing elastic computing techniques and dynamic resource management from the edge-cloud computing domain to the swarm robotics domain. This enables the dynamic provisioning of collective capabilities in the swarm for different applications. Therefore, we transform a swarm into a distributed sensing and computing platform capable of complex data processing tasks, which can then be offered as a service. In particular, we discuss how this can be applied to adaptive resource management in a heterogeneous swarm of drones, and how we are implementing the dynamic deployment of distributed data processing algorithms. With an elastic drone swarm built on reconfigurable hardware and containerized services, it will be possible to raise the self-awareness, degree of intelligence, and level of autonomy of heterogeneous swarms of robots. We describe novel directions for collaborative perception, and new ways of interacting with a robotic swarm.

SPApr 17, 2020

UWB-Based Localization for Multi-UAV Systems and Collaborative Heterogeneous Multi-Robot Systems: a Survey

Wang Shule, Carmen Martínez Almansa, Jorge Peña Queralta et al.

Ultra-wideband technology has emerged in recent years as a robust solution for localization in GNSS denied environments. In particular, its high accuracy when compared to other wireless localization solutions is enabling a wider range of collaborative and multi-robot application scenarios, being able to replace more complex and expensive motion-capture areas for use cases where accuracy in the order of tens of centimeters is sufficient. We present the first survey of UWB-based localization focused on multi-UAV systems and heterogeneous multi-robot systems. We have found that previous literature reviews do not consider in-depth the challenges in both aerial navigation and navigation with multiple robots, but also in terms of heterogeneous multi-robot systems. In particular, this is, to the best of our knowledge, the first survey to review recent advances in UWB-based (i) methods that enable ad-hoc and dynamic deployments; (ii) collaborative localization techniques; and (iii) cooperative sensing and cooperative maneuvers such as UAV docking on mobile platforms. Finally, we also review existing datasets and discuss the potential of this technology for both localization in GNSS-denied environments and collaboration in multi-robot systems.

IVApr 17, 2020

Multi-Scale Supervised 3D U-Net for Kidneys and Kidney Tumor Segmentation

Wenshuai Zhao, Dihong Jiang, Jorge Peña Queralta et al.

Accurate segmentation of kidneys and kidney tumors is an essential step for radiomic analysis as well as developing advanced surgical planning techniques. In clinical analysis, the segmentation is currently performed by clinicians from the visual inspection images gathered through a computed tomography (CT) scan. This process is laborious and its success significantly depends on previous experience. Moreover, the uncertainty in the tumor location and heterogeneity of scans across patients increases the error rate. To tackle this issue, computer-aided segmentation based on deep learning techniques have become increasingly popular. We present a multi-scale supervised 3D U-Net, MSS U-Net, to automatically segment kidneys and kidney tumors from CT images. Our architecture combines deep supervision with exponential logarithmic loss to increase the 3D U-Net training efficiency. Furthermore, we introduce a connected-component based post processing method to enhance the performance of the overall process. This architecture shows superior performance compared to state-of-the-art works using data from KiTS19 public dataset, with the Dice coefficient of kidney and tumor up to 0.969 and 0.805 respectively. The segmentation techniques introduced in this paper have been tested in the KiTS19 challenge with its corresponding dataset.

ROApr 14, 2020

Autocalibration of a Mobile UWB Localization System for Ad-Hoc Multi-Robot Deployments in GNSS-Denied Environments

Carmen Martínez Almansa, Wang Shule, Jorge Peña Queralta et al.

Ultra-wideband (UWB) wireless technology has seen an increased penetration in the robotics field as a robust localization method in recent years. UWB enables high accuracy distance estimation from time-of-flight measurements of wireless signals, even in non-line-of-sight measurements. UWB-based localization systems have been utilized in various types of GNSS-denied environments for ground or aerial autonomous robots. However, most of the existing solutions rely on a fixed and well-calibrated set of UWB nodes, or anchors, to estimate accurately the position of other mobile nodes, or tags, through multilateration. This limits the applicability of such systems for dynamic and ad-hoc deployments, such as post-disaster scenarios where the UWB anchors could be mounted on mobile robots to aid the navigation of UAVs or other robots. We introduce a collaborative algorithm for online autocalibration of anchor positions, enabling not only ad-hoc deployments but also movable anchors, based on Decawave's DWM1001 UWB module. Compared to the built-in autocalibration process from Decawave, we drastically reduce the amount of calibration time and increase the accuracy at the same time. We provide both experimental measurements and simulation results to demonstrate the usability of this algorithm.

CRNov 23, 2019

Blockchain-Powered Collaboration in Heterogeneous Swarms of Robots

Jorge Peña Queralta, Tomi Westerlund

One of the key challenges in the collaboration within heterogeneous multi-robot systems is the optimization of the amount and type of data to be shared between robots with different sensing capabilities and computational resources. In this paper, we present a novel approach to managing collaboration terms in heterogeneous multi-robot systems with blockchain technology. Leveraging the extensive research of consensus algorithms in the blockchain domain, we exploit key technologies in this field to be integrated for consensus in robotic systems. We propose the utilization of proof of work systems to have an online estimation of the available computational resources at different robots. Furthermore, we define smart contracts that integrate information about the environment from different robots in order to evaluate and rank the quality and accuracy of each of the robots' sensor data. This means that the key parameters involved in heterogeneous robotic collaboration are integrated within the Blockchain and estimated at all robots equally without explicitly sharing information about the robots' hardware or sensors. Trustability is based on the verification of data samples that are submitted to the blockchain within each data exchange transaction and validated by other robots operating in the same environment. Initial results are reported which show the viability of the concepts presented in this paper.